Skip to content

Commit 00c4ad5

Browse files
committed
Pull out beluga
1 parent 2b2fe27 commit 00c4ad5

File tree

3 files changed

+34
-50
lines changed

3 files changed

+34
-50
lines changed

docs/beluga.rst

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
=======
2+
Beluga
3+
=======
4+
5+
Introduction
6+
------------
7+
8+
DeepSEA is a deep learning-based algorithmic framework for predicting the chromatin effects of sequence alterations with single nucleotide sensitivity. DeepSEA can accurately predict the epigenetic state of a sequence, including transcription factors binding, DNase I sensitivities and histone marks in multiple cell types, and further utilize this capability to predict the chromatin effects of sequence variants and prioritize regulatory variants.
9+
10+
The 2019 version of DeepSEA, nicknamed '**Beluga**', can predict **2002** chromatin features. Beluga is described in:
11+
12+
Jian Zhou, Chandra L. Theesfeld, Kevin Yao, Kathleen M. Chen, Aaron K. Wong, and Olga G. Troyanskaya, **Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk**. Nature Genetics (2018).
13+
14+
To determine if certain features (ie. transcription factors, marks, or cell types) are present/accounted for in the model, refer to the `supplemental feature table <https://s3-us-west-2.amazonaws.com/humanbase-dev/deepsea/examples/41588_2019_420_MOESM9_ESM.csv>`_ which has all the profiles used to train DeepSEA.

docs/deepsea.rst

Lines changed: 19 additions & 49 deletions
Original file line numberDiff line numberDiff line change
@@ -5,22 +5,18 @@ DeepSEA
55
Introduction
66
------------
77

8-
DeepSEA is a deep learning-based algorithmic framework for predicting the chromatin effects of sequence alterations with single nucleotide sensitivity. DeepSEA can accurately predict the epigenetic state of a sequence, including transcription factors binding, DNase I sensitivities and histone marks in multiple cell types, and further utilize this capability to predict the chromatin effects of sequence variants and prioritize regulatory variants.
8+
Sei is a deep-learning-based framework for systematically predicting sequence regulatory activities and applying sequence information to understand human genetics data. Sei provides a global map from any sequence to regulatory activities, as represented by 40 sequence classes. Each sequence class integrates predictions for 21,907 chromatin profiles (transcription factor, histone marks, and chromatin accessibility profiles across a wide range of cell types) from the underlying Sei deep learning model. You can also find the Sei code repository here (https://github.com/FunctionLab/sei-framework) or read about our manuscript here (https://www.biorxiv.org/content/10.1101/2021.07.29.454384v1).
99

10-
The current version of DeepSEA, nicknamed '**Beluga**', can predict **2002** chromatin features. Beluga is described in:
10+
Sequence class-level variant effects are computed by comparing the predictions for the reference and the alternative alleles. A positive score indicates an increase in sequence class activity by the alternative allele and vice versa. Sequence class-level scores are computed by projecting the 21,907 chromatin profile predictions for the sequence to the unit vector that represents each sequence class.
1111

12-
Jian Zhou, Chandra L. Theesfeld, Kevin Yao, Kathleen M. Chen, Aaron K. Wong, and Olga G. Troyanskaya, **Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk**. Nature Genetics (2018).
12+
For older DeepSEA models see:
13+
`Beluga`_
1314

14-
DeepSEA is originally described in the following manuscript:
15-
16-
Jian Zhou, Olga G. Troyanskaya. **Predicting the Effects of Noncoding Variants with Deep learning-based Sequence Model.** Nature Methods (2015).
17-
18-
To determine if certain features (ie. transcription factors, marks, or cell types) are present/accounted for in the model, refer to the `supplemental feature table <https://s3-us-west-2.amazonaws.com/humanbase-dev/deepsea/examples/41588_2019_420_MOESM9_ESM.csv>`_ which has all the profiles used to train DeepSEA.
1915

2016
Input
2117
-----
2218

23-
DeepSEA predicts genomic variant effects on a wide range of chromatin features at the variant position (Transcription factors binding, DNase I hypersensitive sites, and histone marks in multiple human cell types). DeepSEA can also be utilized for predicting chromatin features for any DNA sequence.
19+
DeepSEA predicts genomic variant effects on a wide range of chromatin features at the variant position (Transcription factors binding, DNase I hypersensitive sites, and histone marks in multiple human cell types). DeepSEA can also be ultilized for predicting chromatin features for any DNA sequence.
2420

2521
File formats
2622
~~~~~~~~~~~~
@@ -31,46 +27,20 @@ We support three types of input: vcf, fasta, bed. If you want to predict effects
3127
**Fasta format** input should include sequences of 2000bp length each. If a sequence is longer than 2000bp, only the center 2000bp will be used. A minimal example is ::
3228

3329
>TestSequence
34-
TGGGATTACAGGCGTGAGCCACCGCGCCCGGCCCATTGTACCATTCTTAT
35-
GCCTTTGCGTCCTCATAGCTTAGCTCCCGTATATCAGTGAGAACATACTA
36-
TGTTTGGTTTTCCATACCCGAGTTACTTCACTTAGAATAATAGTCTCCAA
37-
TTTCATCCAGGTCAGTGCAAATGCGTTAATTCGTTCCTTTTATGGCTGAG
38-
TAGTATTCCATCATATATATATACTACAGTTTCTTTATCCACTCGTAAAT
39-
TGATGGGCATTTGTGTTGGAACACTTCTCCACTGCTGGTGGGAATGTAAA
40-
TTAGTGCAGCCACTATGGATAACAGTGTGGAGATTTGTTAAAGAACTAAA
41-
ACTAGAACTACCATTTGATCCAGCAATCCCACTACTGGGTATCTACCCAG
42-
AAGAAAAGAAGTCATTATTTGAAAAAGATACTTGCACGGGCATGTTTATA
43-
GCAGCACAATTCACAATTGTAGTTGTATTTCTTTAAGCGTGTCTTTTCAA
44-
TATCTCTCATGTTTCTGGTATAGATGGTATATATGTTAATCTTGTTCCTG
45-
AGGTCTGTTTTTTATTTTTGTCATTAAAGTGGGAATTAAATAGTTTTGTA
46-
GTGCATATAAATTAAAGAAAAAGTTCACATAAGCATATTTGCCAATCATC
47-
TCAAAATGCTATATTCTCCTTCACGGTTTTGAAAATAATTCAGGGTTTTC
48-
TCTTCCTCATTGCTTTCCCACCAACTGACAGTATTATTTTCTTAGTCATT
49-
TTACTGACCTTTGAAATTACTCCTTTGAGGTCTTCTAAAAAATTTTATGG
50-
GCTCTGCTGCTTTTTGGTGGCCTCCTTGTATCATTTATTCTATTACAGGA
51-
CGACTTACAAAAGGAAGCACATAAATTGACCCATATACATATCCTATCAT
52-
TGGGGAGTTTCTGTGCAAATGTTATTTATTGGAAGCTATTACTAAGAATT
53-
GTAAGAAAAATAATTGGTATTGATGCAGCTAGTATGGTTCCTGTAATTAT
54-
CGTACTCAGCCACGTAAATCATAGCTATATGTAGCCAAAGATCCATGAAC
55-
AAAATTTCCAGTAACATCATTATAATTCAAAAGGCAGACTTTCAGAACCA
56-
GACAGACTTGAATTTAAATTCTAGCTTTACCACACATGAATTTAACCTTG
57-
TGGAAGGTTAACCTATCTAAACTCATGTTTCTTCATTGGTAGCTGATAAA
58-
ATTAAGGATCATGTATATAACCACCTAGTAGAGTTGTTTAAGAAACTGTT
59-
AGAATTCCATAAATTGTTAGTATTAATGAGTTTTTGTTGGACATGTGTTA
60-
GGCTAGGCCACTCCTTGACCTTCATAGAGGTATGGATTATGACACAAATT
61-
CTAAACTGTAGGTAGGCATGGCTTTGTAGCAAGTATTAAAATAGTAAATA
62-
TTTTATTTTTATAAGATAAATGTAAACCTTTTAAAAGTTTCATTACATTT
63-
GTATTTATGAAATATCATCCTATATCAACTATAGAGAGAAGATCGCAAGA
64-
AGGCAGTGGCAGCAGAGGCTCCAGTTAGGAGGCTACTAGTCCAAATACAT
65-
TGCGATAAAAACTTGGCAAAAGGTGCTGGTAGTCTGATGAAATAAAGTAG
66-
ATAAATTTTAGAGGTATTTATAAAATAATTAAAGAATATTCAATAATAGG
67-
AGATATATTACCCAATAGAGTGGAGATTCAAAGATAACTCCGAAAGTTTT
68-
TTGCTAAAGCAACATTTGGCTGTGCTATCATTTACTAAGAAAGACAACAA
69-
GAGAGTAAAATCAAGTTTGAGGATGAAGTGAATTTATTCCTTTTTGATTG
70-
ATACATAATTGACATGTAATAAAACCCACAATGTTAAGAGTTCGGTTTGA
71-
TGTGCTTGACTATTTTAGGCACTGGTGTTATCACAACACAAGACAACAGA
72-
TAGGACATTCTCAGAAAATTTTTTCATGTCCCTTTCCAGTCAGTTTCAAG
73-
CCTTCTTTCCATGCAATAATTTTCTCACTTTGCCATTCTAGTAGGTGTGA
30+
TATCTCTCATGTTTCTGGTATAGATGGTATATATGTTAATCTTGTTCCTGAGGTCTGTTTTTTATTTTTGTCATTAAAGT
31+
GGGAATTAAATAGTTTTGTAGTGCATATAAATTAAAGAAAAAGTTCACATAAGCATATTTGCCAATCATCTCAAAATGCT
32+
ATATTCTCCTTCACGGTTTTGAAAATAATTCAGGGTTTTCTCTTCCTCATTGCTTTCCCACCAACTGACAGTATTATTTT
33+
CTTAGTCATTTTACTGACCTTTGAAATTACTCCTTTGAGGTCTTCTAAAAAATTTTATGGGCTCTGCTGCTTTTTGGTGG
34+
CCTCCTTGTATCATTTATTCTATTACAGGACGACTTACAAAAGGAAGCACATAAATTGACCCATATACATATCCTATCAT
35+
TGGGGAGTTTCTGTGCAAATGTTATTTATTGGAAGCTATTACTAAGAATTGTAAGAAAAATAATTGGTATTGATGCAGCT
36+
AGTATGGTTCCTGTAATTATCGTACTCAGCCACGTAAATCATAGCTATATGTAGCCAAAGATCCATGAACAAAATTTCCA
37+
GTAACATCATTATAATTCAAAAGGCAGACTTTCAGAACCAGACAGACTTGAATTTAAATTCTAGCTTTACCACACATGAA
38+
TTTAACCTTGTGGAAGGTTAACCTATCTAAACTCATGTTTCTTCATTGGTAGCTGATAAAATTAAGGATCATGTATATAA
39+
CCACCTAGTAGAGTTGTTTAAGAAACTGTTAGAATTCCATAAATTGTTAGTATTAATGAGTTTTTGTTGGACATGTGTTA
40+
GGCTAGGCCACTCCTTGACCTTCATAGAGGTATGGATTATGACACAAATTCTAAACTGTAGGTAGGCATGGCTTTGTAGC
41+
AAGTATTAAAATAGTAAATATTTTATTTTTATAAGATAAATGTAAACCTTTTAAAAGTTTCATTACATTTGTATTTATGA
42+
AATATCATCCTATATCAACTATAGAGAGAAGATCGCAAGA
43+
7444

7545
**Bed format** provides another way to specify sequences in human reference genome (hg19). The bed input should specify 2000bp-length regions. A minimal example is ``chr1 109817091 109819090``. The three columns are chromosome, start position, and end position.
7646

docs/tissue-networks.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ Examples
1818

1919
IL1B in blood vessel
2020
~~~~~~~~~~~~~~~~~~~~~~~~~
21-
We examined and experimentally verified the tissue-specific molecular response of blood vessel cells to stimulation by IL-1β (IL1B), a proinflammatory cytokine. We anticipated that the genes most tightly connected to IL1B in the blood vessel network would be among those responding to IL-1β stimulation in blood vessel cells. We tested this hypothesis by profiling the gene expression of human aortic smooth muscle cells (HASMCs; the predominant cell type in blood vessels) stimulated with IL-1β.
21+
We examined and experimentally verified the tissue-specific molecular response of blood vessel cells to stimulation by IL-1β (IL1B), a pro-inflammatory cytokine. We anticipated that the genes most tightly connected to IL1B in the blood vessel network would be among those responding to IL-1β stimulation in blood vessel cells. We tested this hypothesis by profiling the gene expression of human aortic smooth muscle cells (HASMCs; the predominant cell type in blood vessels) stimulated with IL-1β.
2222

2323
Examination of the genes whose expression was significantly upregulated at 2 h after stimulation showed that 18 of the 20 IL1B network neighbors were among the top 500 most upregulated genes in the experiment (P = 2.07 × 10−23). The blood vessel network was the most accurate tissue network in predicting this experimental outcome; none of the other 143 tissue-specific networks or the tissue-naive network performed as well when evaluated by each network's ability to predict the result of IL-1β stimulation on the cells.
2424

0 commit comments

Comments
 (0)