Representation learning of genomic sequence motifs with convolutional neural networks

Koo, Peter K.; Eddy, Sean R.

doi:10.1371/journal.pcbi.1007560

Cited by 82 publications

(122 citation statements)

References 23 publications

Supporting

Mentioning

118

Contrasting

Order By: Relevance

“…To generate data-driven hypotheses, first-order and secondorder attribution methods can be employed to identify important local features. Because attribution maps can be noisy, it may be beneficial to employ CNNs that are designed to learn more interpretable representations in first layer filters (Koo & Eddy, 2019). It turns out that CNNs designed to learn interpretable filters also yield more reliable representations with attribution methods .…”

Section: Resultsmentioning

confidence: 99%

“…Recent advances have made it possible to intentionally design CNNs to learn more human-interpretable patterns in convolutional filters. This includes design principles based on spatial information flow through the network and employing highly divergent activation functions such as the exponential function (Koo & Eddy, 2019;Koo & Ploenzke, 2019). In parallel, advances have been developed to make direct weight visualization more interpretable (Ploenzke & Irizarry, 2018).…”

Section: Global Interpretabilitymentioning

confidence: 99%

See 1 more Smart Citation

Interpreting Deep Neural Networks Beyond Attribution Methods: Quantifying Global Importance of Genomic Features

Koo

Ploenzke

2020

Preprint

Self Cite

View full text Add to dashboard Cite

Despite deep neural networks (DNNs) having found great success at improving performance on various prediction tasks in computational genomics, it remains difficult to understand why they make any given prediction. In genomics, the main approaches to interpret a high-performing DNN are to visualize learned representations via weight visualizations and attribution methods. While these methods can be informative, each has strong limitations. For instance, attribution methods only uncover the independent contribution of single nucleotide variants in a given sequence.Here we discuss and argue for global importance analysis which can quantify population-level importance of putative features and their interactions learned by a DNN. We highlight recent work that has benefited from this interpretability approach and then discuss connections between global importance analysis and causality.

show abstract

Section: Resultsmentioning

confidence: 99%

Section: Global Interpretabilitymentioning

confidence: 99%

Interpreting Deep Neural Networks Beyond Attribution Methods: Quantifying Global Importance of Genomic Features

Koo

Ploenzke

2020

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…These results suggest that while the use of rational features may facilitate the abstraction of potentially relevant information of toehold switch function, the one-hot sequence-only MLP model can recover such information without a priori hypothesisdriven assumptions built into the model if given sufficient training data. 40 In order to evaluate the degree of biological generalization in our sequence-only MLP model, we performed two additional rounds of validation. First, we iteratively withheld each of the 23 tiled viral genomes in the dataset during training and predicted their function as test sets, resulting in a 0.82-0.98 AUROC range (average 0.87, Fig.…”

Section: Improved Prediction Using Sequence-based Multilayer Perceptrmentioning

confidence: 99%

“…By contrast, mechanistic hypothesis-driven models can more directly inform which aspects of a biological theory best explain the observations. Various methods have been established to address this limitation, including alternative network 30 architectures (39), and the use of saliency maps (40,41), which reveal the regions of an input that deep learning models weigh most heavily and therefore pay the most attention to when making predictions. While saliency maps have been previously used to visualize model attention in one-hot representations of sequence data (10,17,18,20,40), such implementations focus only on the primary sequence and have not been developed to identify secondary structure 35 interactions, which are especially relevant in the operation of RNA synthetic biology elements.…”

Section: Visualizing Learned Rna Secondary Structure Motifs With Vis4mentioning

confidence: 99%

“…In the few cases where secondary structure has been investigated, input representations have been constrained to predetermined structures based on the predictions of thermodynamic models (37, 38) whose abstractions we have found cause significant information loss. 40 We sought to visualize RNA secondary structures learned by our neural networks in a manner unconstrained by thermodynamic modeling. To achieve this, we trained a CNN on a twodimensional nucleotide complementarity map representation (Fig.…”

Section: Visualizing Learned Rna Secondary Structure Motifs With Vis4mentioning

confidence: 99%

See 1 more Smart Citation

Deep Learning for RNA Synthetic Biology

et al. 2019

Preprint

View full text Add to dashboard Cite

Engineered RNA elements are programmable tools capable of detecting small molecules, proteins, and nucleic acids. Predicting the behavior of these tools remains a challenge, a situation that could be addressed through enhanced pattern recognition from deep 30 learning. Thus, we investigate Deep Neural Networks (DNN) to predict toehold switch function as a canonical riboswitch model in synthetic biology. To facilitate DNN training, we synthesized and characterized in vivo a dataset of 91,534 toehold switches spanning 23 viral genomes and 906 human transcription factors. DNNs trained on nucleotide sequences outperformed (R 2 =0.43-0.70) previous state-of-the-art thermodynamic and kinetic models (R 2 =0.04-0.15) and allowed 35for human-understandable attention-visualizations (VIS4Map) to identify success and failure modes. This deep learning approach constitutes a major step forward in engineering and understanding of RNA synthetic biology. 40One Sentence Summary: Deep neural networks are used to improve functionality prediction and provide insights on toehold switches as a model for RNA synthetic biology tools.

show abstract

ResidualBind: Uncovering Sequence-Structure Preferences of RNA-Binding Proteins with Deep Neural Networks

Koo

Ploenzke

Paul

et al. 2023

Methods in Molecular Biology

View full text Add to dashboard Cite

Representation learning of genomic sequence motifs with convolutional neural networks

Cited by 82 publications

References 23 publications

Interpreting Deep Neural Networks Beyond Attribution Methods: Quantifying Global Importance of Genomic Features

Interpreting Deep Neural Networks Beyond Attribution Methods: Quantifying Global Importance of Genomic Features

Deep Learning for RNA Synthetic Biology

ResidualBind: Uncovering Sequence-Structure Preferences of RNA-Binding Proteins with Deep Neural Networks

Contact Info

Product

Resources

About