2015
DOI: 10.1038/nbt.3300
|View full text |Cite
|
Sign up to set email alerts
|

Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning

Abstract: Knowing the sequence specificities of DNA-and RNA-binding proteins is essential for developing models of the regulatory processes in biological systems and for identifying causal disease variants. Here we show that sequence specificities can be ascertained from experimental data with 'deep learning' techniques, which offer a scalable, flexible and unified computational approach for pattern discovery. Using a diverse array of experimental data and evaluation metrics, we find that deep learning outperforms other… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

10
2,403
0
9

Year Published

2016
2016
2022
2022

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 2,491 publications
(2,508 citation statements)
references
References 47 publications
10
2,403
0
9
Order By: Relevance
“…For example, DeepBind is an algorithm that uses deep learning to predict the sequence specificity of RNA-and DNA-binding proteins. 66 DeepBind is trained on MAVE data, including DNA-binding assays such as SELEX, protein-binding microarrays, and RNA-binding assays such as RNAcompete. [67][68][69] MAVE data can also be useful in evaluating new predictive tools, as was done for EVmutation, which predicts variant effects in proteins from co-variation in multiple-sequence alignments.…”
Section: Limitations Of Maves and How To Overcome Themmentioning
confidence: 99%
“…For example, DeepBind is an algorithm that uses deep learning to predict the sequence specificity of RNA-and DNA-binding proteins. 66 DeepBind is trained on MAVE data, including DNA-binding assays such as SELEX, protein-binding microarrays, and RNA-binding assays such as RNAcompete. [67][68][69] MAVE data can also be useful in evaluating new predictive tools, as was done for EVmutation, which predicts variant effects in proteins from co-variation in multiple-sequence alignments.…”
Section: Limitations Of Maves and How To Overcome Themmentioning
confidence: 99%
“…This difficulty is most clearly reflected in the distribution of mutations listed in databases of Mendelian disorders, such as the Human Gene Mutation Database, where most mutations are found within coding regions (86%) or at intronic splice sites (11%), with only a small fraction (3%) identified in regulatory regions (14). Newer methods for annotating and predicting the impact of noncoding (NC) variants have provided substantial improvements (15,16), but experimental validation of the presumed effects remains critical for the determination of pathogenicity and elucidation of the mechanism of action (13,17).…”
Section: Mendelian Erythroid Disordersmentioning
confidence: 99%
“…First, we trained a gapped k-mer support vector machine (gkmer-SVM) on EB and K562 open chromatin data and used delta-SVM to predict single nucleotide effects (16). We then used already-trained models for the TFs GATA1, TAL1, KLF1, and NFE2 from DeepBind, a convolutional neural network approach (15). For each CRE proximal to the 20 MED genes, we created a "mutation map" of the predicted effects of all possible single nucleotide changes (15).…”
Section: Understanding and Predicting The Effects Of Nc Mutations On mentioning
confidence: 99%
See 1 more Smart Citation
“…On the robotic area, Ian Lenz, Honglak Lee and Ashutosh Saxena are using deep learning methods in order to solve the problem of detecting robotic grasps in an RGB-D view of a scene containing objects [15]. Bioinformatics area, Deep learning using for recognizing disease from DNA sequences [16] or predicting protein secondary structure [17]. Another state-of-art study is on largescale video classification with Convolutional Neural Networks (CNN) using a new dataset of 1 million YouTube videos belonging to 487 classes [18].…”
Section: Deep Learning Algorithmsmentioning
confidence: 99%