The Kipoi repository accelerates community exchange and reuse of predictive models for genomics

Avsec, Žiga; Kreuzhuber, Roman; Israeli, Johnny; Xu, Nancy; Cheng, Jianghua; Shrikumar, Avanti; Banerjee, Abhimanyu; Kim, Daniel Sunwook; Beier, Thorsten; Urban, Lara; Kundaje, Anshul; Stegle, Oliver

doi:10.1038/s41587-019-0140-0

Cited by 129 publications

(111 citation statements)

References 24 publications

Supporting

Mentioning

109

Contrasting

Order By: Relevance

“…Transfer learning has been shown to dramatically reduce the amount of training needed for related classification tasks and improves the overall predictive performance compared to training from scratch 28 . In the pre-training step, we trained a CNN on 4,863,024 1 kb sequences annotated with a total of 919 ChIP-seq and DNase-seq profiles collected from ENCODE 26 and the Epigenomics Roadmap Project 29 across dozens of cell types ( Methods ).…”

Section: Predicting Binding Status Of Transcription Factor Motif Occumentioning

confidence: 99%

Deep neural networks identify context-specific determinants of transcription factor binding affinity

Zheng

Lamkin

et al. 2020

Preprint

View full text Add to dashboard Cite

Transcription factors (TFs) bind DNA by recognizing highly specific DNA sequence motifs, typically of length 6-12bp. A TF motif can occur tens of thousands of times in the human genome, but only a small fraction of those sites are actually bound. Despite the availability of genome-wide TF binding maps for hundreds of TFs, predicting whether a given motif occurrence is bound and identifying the influential context features remain challenging. Here we present a machine learning framework leveraging existing convolutional neural network architectures and state of the art model interpretation techniques to identify, visualize, and interpret context features most important for determining binding activity for a particular TF. We apply our framework to predict binding at motifs for 38 TFs in a lymphoblastoid cell line and achieve superior classification performance compared to existing frameworks. We compute importance scores for context regions at single base pair resolution and uncover known and novel determinants of TF binding. Finally, we demonstrate that important context bases are under increased purifying selection compared to nearby bases and are enriched in disease-associated variants identified by genome-wide association studies.

show abstract

Section: Predicting Binding Status Of Transcription Factor Motif Occumentioning

confidence: 99%

Deep neural networks identify context-specific determinants of transcription factor binding affinity

Zheng

Lamkin

et al. 2020

Preprint

View full text Add to dashboard Cite

show abstract

“…The mutation map allows assessing the relative importance of variants compared with other possible variants in the vicinity. The MMSplice implementation followed the Kipoi API (version 0.65), a programmatic standard for predictive models in genomics (Avsec et al, ). In particular, it is compatible with the Kipoi variant effect prediction plugin allowing the generation of mutation maps.…”

Section: Resultsmentioning

confidence: 99%

CAGI 5 splicing challenge: Improved exon skipping and intron retention predictions with MMSplice

et al. 2019

Self Cite

View full text Add to dashboard Cite

Pathogenic genetic variants often primarily affect splicing. However, it remains difficult to quantitatively predict whether and how genetic variants affect splicing. In 2018, the fifth edition of the Critical Assessment of Genome Interpretation proposed two splicing prediction challenges based on experimental perturbation assays: Vex‐seq, assessing exon skipping, and MaPSy, assessing splicing efficiency. We developed a modular modeling framework, MMSplice, the performance of which was among the best on both challenges. Here we provide insights into the modeling assumptions of MMSplice and its individual modules. We furthermore illustrate how MMSplice can be applied in practice for individual genome interpretation, using the MMSplice VEP plugin and the Kipoi variant interpretation plugin, which are directly applicable to VCF files.

show abstract

“…These models have been integrated into the Kipoi API [30], allowing them to be applied with very little overhead to a VCF file containing human variant data (see also Figure 6). As a result the models are easy to use and straightforward to integrate into existing variant annotation pipelines.…”

Section: Modelling 5'utr Of Any Length Using Frame Poolingmentioning

confidence: 99%

Predicting Mean Ribosome Load for 5’UTR of any length using Deep Learning

Karollus

Avsec

Gagneur

2020

Preprint

Self Cite

View full text Add to dashboard Cite

The 5' untranslated region plays a key role in regulating mRNA translation and consequently protein abundance. Therefore, accurate modeling of 5'UTR regulatory sequences shall provide insights into translational control mechanisms and help interpret genetic variants. Recently, a model was trained on a massively parallel reporter assay to predict mean ribosome load (MRL) -a proxy for translation rate -directly from 5'UTR sequence with a high degree of accuracy. However, this model is restricted to sequence lengths investigated in the reporter assay and therefore cannot be applied to the majority of human sequences without a substantial loss of information. Here, we introduced frame pooling, a novel neural network operation that enabled the development of an MRL prediction model for 5'UTRs of any length. Our model shows state-of-the-art performance on fixed length randomized sequences, while offering better generalization performance on longer sequences and on a variety of translation-related genome-wide datasets. Variant interpretation is demonstrated on a 5'UTR variant of the gene HBB associated with beta-thalassemia. Frame pooling could find applications in other bioinformatics predictive tasks. Moreover, our model, released open source, could help pinpoint pathogenic genetic variants.Recently a massively parallel reporter assay (MPRA) has been developed which provided a

show abstract

The Kipoi repository accelerates community exchange and reuse of predictive models for genomics

Cited by 129 publications

References 24 publications

Deep neural networks identify context-specific determinants of transcription factor binding affinity

Deep neural networks identify context-specific determinants of transcription factor binding affinity

CAGI 5 splicing challenge: Improved exon skipping and intron retention predictions with MMSplice

Predicting Mean Ribosome Load for 5’UTR of any length using Deep Learning

Contact Info

Product

Resources

About