Evaluating deep learning for predicting epigenomic profiles

Toneyan, Shushan; Tang, Ziqi; Koo, Peter K.

doi:10.1101/2022.04.29.490059

Cited by 11 publications

(8 citation statements)

References 77 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…9). We also found that gradient correction worked well for various CNNs trained to predict quantitative levels of normalized read-coverage of 15 ATAC-seq datasets at base-resolution 19 (Fig. 2e, Supplementary Fig.…”

mentioning

confidence: 70%

See 1 more Smart Citation

Correcting gradient-based interpretations of deep neural networks for genomics

Majdandzic

Koo

2022

Preprint

Self Cite

View full text Add to dashboard Cite

Gradients of a deep neural network’s predictions with respect to the inputs are used in a variety of downstream analyses, notably in post hoc explanations with feature attribution methods. For data with input features that live on a lower-dimensional manifold, we observe that the learned function can exhibit arbitrary behaviors off the manifold, where no data exists to anchor the function during training. This leads to a random component in the gradients which manifests as noise. We introduce a simple correction for this off-manifold gradient noise for the case of categorical input features, where input values are subject to a probabilistic simplex constraint, and demonstrate its effectiveness on regulatory genomics data. We find that our correction consistently leads to a significant improvement in gradient-based attribution scores.

show abstract

mentioning

confidence: 70%

“…We acquired the test data and the trained CNN-base and CNN-32 models with exponential activations and ReLU activations from Ref. 19 ; a total of 4 models. Each CNN takes as input 2kb length sequences and outputs a prediction of normalized read-coverage for 15 ATAC-seq bigWig tracks (i.e.…”

Section: Datamentioning

confidence: 99%

Correcting gradient-based interpretations of deep neural networks for genomics

Majdandzic

Koo

2022

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…In the context of BPNet [2], the authors therefore performed peak calling on ChIP-nexus data to select a set of regions highly enriched in count signal. However, recent work by Toneyan et al [51] suggests that peak callers select sites too conservatively, which may result in under-fitting of sequence-to-signal models.…”

Section: Methodsmentioning

confidence: 99%

Towards In-Silico CLIP-seq: Predicting Protein-RNA Interaction via Sequence-to-Signal Learning

Horlacher

Wagner

Moyon

et al. 2022

Preprint

View full text Add to dashboard Cite

Unraveling sequence determinants which drive protein-RNA interaction is crucial for studying binding mechanisms and the impact of genomic variants. While CLIP-seq allows for transcriptome-wide profiling of in vivo protein-RNA interactions, it is limited to expressed transcripts, requiring computational imputation of missing binding information. Existing classification-based methods predict binding with low resolution and depend on prior labeling of transcriptome regions to obtain high-quality training sets. We present RBPNet, a novel deep learning method, which predicts CLIP crosslink count distribution from RNA sequence at single-nucleotide resolution. RBPNet performs bias correction by modeling the raw CLIP-seq signal as a mixture of the (unobserved) protein-specific and background signal obtained from control experiments. By training on up to a million regions with elevated signal, RBPNet achieves better generalization over state-of-the-art classifiers on a variety of assays, including eCLIP, iCLIP and miCLIP. Through model interrogation via Integrated Gradients, RBPNet identifies highly predictive sub-sequences corresponding to known binding motifs and enables variant-impact scoring via in silico mutagenesis. Together, RBPNet improves inference of protein-RNA interaction, as well as mechanistic interpretation of predictions, by modeling the raw CLIP-seq data at high resolution.

show abstract

“…With image data, basic affine transformations can translate, magnify, or rotate an image without changing its label. For genomics, the available neutral augmentations are reverse-complement transformations 16 and small random translations of the input sequence 17, 18 . With the finite size of experimental data and a paucity of augmentation methods, there exist only limited strategies to promote generalization for genomic DNNs.…”

mentioning

confidence: 99%

“…An important downstream application of genomic DNNs is scoring the functional consequences of mutations. Following previous procedures 2, 17 , we compared model predictions with saturation mutagenesis of 15 cis -regulatory elements measured experimentally with a massively parallel reporter assay – data collected through the CAGI5 Challenge 21 . As expected, EvoAugtrained DNNs outperformed standard training on this out-of-distribution generalization task (Fig.…”

mentioning

confidence: 99%

EvoAug: improving generalization and interpretability of genomic deep neural networks with evolution-inspired data augmentations

Lee

Tang

Toneyan

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

Data augmentations can greatly enhance the generalization of deep neural networks (DNNs). However, there are limited strategies available in regulatory genomics because modifying DNA can alter its function in unknown ways. Here we introduce a suite of evolution-inspired augmentations and circumvent issues of unknown function through a fine-tuning procedure using the original, unperturbed data. We demonstrate this approach improves generalization and model interpretability across several established DNNs in regulatory genomics.

show abstract

Evaluating deep learning for predicting epigenomic profiles

Cited by 11 publications

References 77 publications

Correcting gradient-based interpretations of deep neural networks for genomics

Correcting gradient-based interpretations of deep neural networks for genomics

Towards In-Silico CLIP-seq: Predicting Protein-RNA Interaction via Sequence-to-Signal Learning

EvoAug: improving generalization and interpretability of genomic deep neural networks with evolution-inspired data augmentations

Contact Info

Product

Resources

About