Global Importance Analysis: An Interpretability Method to Quantify Importance of Genomic Features in Deep Neural Networks

Koo, Peter K.; Ploenzke, Matthew; Paul, Steffan B.; Majdandzic, Antonio

doi:10.1101/2020.09.08.288068

Cited by 17 publications

(25 citation statements)

References 68 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This in combination with the large number overlap of Nrf1 and Tcf12 binding sites suggests Nrf1 and Tcf12 binding may be regulated by logic beyond the presence or absence of their DNA binding motifs. We also show that under class imbalanced data, EPE and DEPE generate robust estimates of transcription factor effects in contrast to Global Importance Analysis (15), which takes a related approach but uses a difference instead of a ratio to estimate the effect of a pattern (Fig S2).…”

Section: Resultsmentioning

confidence: 94%

“…In these cases, explicitly training models on each differential comparison between cell types would be computational expensive and time intensive. We also note that Expected Pattern Effect resembles other methods for extracting pattern effects from deep neural networks (3,15), except that we compute Expected Pattern Effect to permit the comparison of pattern effects between conditions, which is important for identification of cell type-specific or condition-specific sequence features and show that our use of ratio to compare effects is more robust to analysis of cell type-specific transcription factor activity under class imbalance.…”

Section: Discussionmentioning

confidence: 96%

“…We also avoid using package managers such as pip and conda which are present with frameworks such as Selene 20 , Kipoi 21 , and pyyster 22 , as problems with incompatibility can still arise with typical package managers as a consequence of python, package manager, and operating system updates. We note that Expected Pattern Effect resembles other methods for extracting pattern effects from deep neural networks 3,13 , except that we compute Expected Pattern Effect such that we can compare pattern effects between conditions, which is important for identification of cell typespecific or condition-specific sequence features. While DeepAccess was developed specifically for identifying cell type-specific sequence features from chromatin accessibility, Differential Expected Pattern Effect can be used to discover condition-specific sequence features from many types of experimental genome-wide sequencing data.…”

Section: Figure 1 Differential Expected Pattern Effect Identifies Comentioning

confidence: 99%

“…We note that Expected Pattern Effect resembles other methods for extracting pattern effects from deep neural networks 3,13 , except that we compute Expected Pattern Effect such that we can compare pattern effects between conditions, which is important for identification of cell type-specific or condition-specific sequence features.…”

Section: Mainmentioning

confidence: 99%

See 3 more Smart Citations

Discovering differential genome sequence activity with interpretable and efficient deep learning

Hammelman¹,

Gifford

2021

Preprint

View full text Add to dashboard Cite

show abstract

Section: Resultsmentioning

confidence: 94%

Section: Discussionmentioning

confidence: 96%

Section: Figure 1 Differential Expected Pattern Effect Identifies Comentioning

confidence: 99%

Section: Mainmentioning

confidence: 99%

See 2 more Smart Citations

Discovering differential genome sequence activity with interpretable and efficient deep learning

Hammelman¹,

Gifford

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…More recently, a new wave of algorithms have been introduced that use deep neural networks to predict RBP binding sites ( Alipanahi et al , 2015 ; Ghanbari and Ohler, 2020 ; Grønning et al , 2020 ; Pan and Shen, 2018 ; Yan and Zhu, 2020 ). One challenge is to explain what these complex models have learned, although recently a multitude of methods for interpreting the learned models have been developed, for instance, based on in silico mutagenesis, predictions on synthetic sequences, gradient tracing and analyzing the convolutional filters ( Alipanahi et al , 2015 ; Ghanbari and Ohler, 2020 ; Koo et al , 2020 ; Pan and Shen, 2018 ). However, with the increasing number of model parameters and network complexity, the risk grows that such models could also learn experimental biases in the datasets.…”

Section: Introductionmentioning

confidence: 99%

Thermodynamic modeling reveals widespread multivalent binding by RNA-binding proteins

Sohrabi-Jahromi

Söding

2021

Bioinformatics

View full text Add to dashboard Cite

Motivation Understanding how proteins recognize their RNA targets is essential to elucidate regulatory processes in the cell. Many RNA-binding proteins (RBPs) form complexes or have multiple domains that allow them to bind to RNA in a multivalent, cooperative manner. They can thereby achieve higher specificity and affinity than proteins with a single RNA-binding domain. However, current approaches to de novo discovery of RNA binding motifs do not take multivalent binding into account. Results We present Bipartite Motif Finder (BMF), which is based on a thermodynamic model of RBPs with two cooperatively binding RNA-binding domains. We show that bivalent binding is a common strategy among RBPs, yielding higher affinity and sequence specificity. We furthermore illustrate that the spatial geometry between the binding sites can be learned from bound RNA sequences. These discovered bipartite motifs are consistent with previously known motifs and binding behaviors. Our results demonstrate the importance of multivalent binding for RNA-binding proteins and highlight the value of bipartite motif models in representing the multivalency of protein-RNA interactions. Availability and implementation BMF source code is available at https://github.com/soedinglab/bipartite_motif_finder under a GPL license. The BMF web server is accessible at https://bmf.soedinglab.org. Supplementary information Supplementary data are available at Bioinformatics online.

show abstract

Modeling chromatin state from sequence across angiosperms using recurrent convolutional neural networks

et al. 2022

View full text Add to dashboard Cite

Accessible chromatin regions are critical components of gene regulation but modeling them directly from sequence remains challenging, especially within plants, whose mechanisms of chromatin remodeling are less understood than in animals. We trained an existing deep learning architecture, DanQ, on leaf ATAC-seq data from 12 angiosperm species to predict the chromatin accessibility of sequence windows within and across species. We also trained DanQ on DNA methylation data from 10 angiosperms, because unmethylated regions have been shown to overlap significantly with accessible chromatin regions in some plants. The across-species models have comparable or even superior performance to a model trained within species, suggesting strong conservation of chromatin mechanisms across angiosperms. Testing a maize held out model on a multi-tissue scATAC panel revealed our models are best at predicting constitutively-accessible chromatin regions, with diminishing performance as cell-type specificity increases. Using a combination of interpretation methods, we ranked JASPAR motifs by their importance to each model and saw that the TCP and AP2/ ERF transcription factor families consistently ranked highly. We embedded the top three JASPAR motifs for each model at all possible positions on both strands in our sequence window and observed position-and strand-specific patterns in their importance to the model. With our cross-species "a2z" model it is now feasible to predict the chromatin accessibility and methylation landscape of any angiosperm genome.

show abstract

Global Importance Analysis: An Interpretability Method to Quantify Importance of Genomic Features in Deep Neural Networks

Cited by 17 publications

References 68 publications

Discovering differential genome sequence activity with interpretable and efficient deep learning

Discovering differential genome sequence activity with interpretable and efficient deep learning

Thermodynamic modeling reveals widespread multivalent binding by RNA-binding proteins

Modeling chromatin state from sequence across angiosperms using recurrent convolutional neural networks

Contact Info

Product

Resources

About