Sequential regulatory activity prediction across chromosomes with convolutional neural networks

Kelley, David R.; Reshef, Yakir; Bileschi, Maxwell L; Belanger, David; McLean, Cory Y.; Snoek, Jasper

doi:10.1101/gr.227819.117

Cited by 384 publications

(439 citation statements)

References 59 publications

Supporting

Mentioning

405

Contrasting

Unclassified

Order By: Relevance

“…The standard approach for mapping motif instances bound by TFs in vivo is to extract bound regions from chromatin immunoprecipitation experiments coupled to sequencing (ChIP-seq) using peak-callers [34][35][36][37][38][39] and identify over-represented motifs in these sequences as position weight matrix models (PWM) [40][41][42][43] .While CNNs are ideally suited to model TF binding from motif combinations and their syntax, current models have limited resolution. State-of-the-art CNN models of TF binding predict binary binding events [25][26][27] or low-resolution continuous binding signal averaged across 100-200 bp windows 44 .…”

Section: Introductionmentioning

confidence: 99%

Base-resolution models of transcription factor binding reveal soft motif syntax

Avsec

Weilert

Shrikumar

et al. 2019

Preprint

116

244

View full text Add to dashboard Cite

Genes are regulated through enhancer sequences, in which transcription factor binding motifs and their specific arrangements (syntax) form a cis-regulatory code. To understand the relationship between motif syntax and transcription factor binding, we train a deep learning model that uses DNA sequence to predict base-resolution binding profiles of four pluripotency transcription factors Oct4, Sox2, Nanog, and Klf4. We interpret the model to accurately map hundreds of thousands of motifs in the genome, learn novel motif representations and identify rules by which motifs and syntax influence transcription factor binding. We find that instances of strict motif spacing are largely due to retrotransposons, but that soft motif syntax influences motif interactions at protein and nucleosome range. Most strikingly, Nanog binding is driven by motifs with a strong preference for ~10.5 bp spacings corresponding to helical periodicity. Interpreting deep learning models applied to high-resolution binding data is a powerful and versatile approach to uncover the motifs and syntax of cis-regulatory sequences.

show abstract

Section: Introductionmentioning

confidence: 99%

Base-resolution models of transcription factor binding reveal soft motif syntax

Avsec

Weilert

Shrikumar

et al. 2019

Preprint

116

244

View full text Add to dashboard Cite

show abstract

“…Each layer had 6 distinct computational operations: 1D convolution with filter size 4 or 8 (conv4, conv8), dilated 1D convolution with rate 10 and filter size 4 or 8 (dconv4, dconv8), max-pooling or average pooling with size 4 (maxpool, avgpool). These hyperparameters for computational operations were selected based on previous works 4, 10 . Moreover, we added an identity mapping to each layer that maps input identically to output without any computations (identity), for potentially reducing the child model complexity.…”

Section: Designing Model Search Spacementioning

confidence: 99%

“…The successful applications of CNNs have been largely attributed to their corresponding architectures. Indeed, for CNN applications in genomics and biomedicine, numerous efforts have been devoted to the development of architectures, such as in DeepSEA 4 , Basenji 10 and SpliceAI 7 . This is similar to the extensive efforts in architecture designs for tackling computer vision problems, for example VGG 11 , Inception 12 , and ResNet 13 .…”

mentioning

confidence: 99%

An automated framework for efficiently designing deep convolutional neural networks in genomics

Park

Theesfeld

Troyanskaya

2020

Preprint

View full text Add to dashboard Cite

Convolutional neural networks (CNNs) have become a standard for analysis of biological sequences. Tuning of network architectures is essential for CNN's performance, yet it requires substantial knowledge of machine learning and commitment of time and effort. This process thus imposes a major barrier to broad and effective application of modern deep learning in genomics. Here, we present AMBER, a fully automated framework to efficiently design and apply CNNs for genomic sequences. AMBER designs optimal models for user-specified biological questions through the state-of-the-art Neural Architecture Search (NAS). We applied AMBER to the task of modelling genomic regulatory features and demonstrated that the predictions of the AMBER-designed model are significantly more accurate than the equivalent baseline non-NAS models and match or even exceed published expert-designed models. Interpretation of AMBER architecture search revealed its design principles of utilizing the full space of computational operations for accurately modelling genomic sequences. Furthermore, we illustrated the use of AMBER to accurately discover functional genomic variants in allele-specific binding and disease heritability enrichment. AMBER provides an efficient automated method for designing accurate deep learning models in genomics.

show abstract

“…Las redes neuronales convolucionales son redes neuronales completamente conectadas que utilizan matrices bidimensionales, llamadas ventanas, para realizar mapeo de los datos, intentado imitar las neuronas de las cortezas visual de cerebro humano (Figura 5) (Eraslan et al, 2019;Wainberg et al, 2018). Estas redes han sido utilizadas para la clasificación de los sitios de unión de factores de transcripción (Zou et al, 2016); Wang, Tai, E, & Wei, 2018), la predicción de fenotipos moleculares (Kelley et al, 2018), metilación de ADN (Zhou et al, 2018), análisis de la expresión génica y microARN (Budach & Marsico, 2018). Las redes neuronales recurrentes son utilizadas cuando se trabaja con datos dinámicos que pueden cambiar en el tiempo.…”

Section: Aprendizaje Profundo Supervisadounclassified

Aprendizaje de máquina y aprendizaje profundo en biotecnología: aplicaciones, impactos y desafíos

Franco

Ramos

2019

cac

View full text Add to dashboard Cite

La bioinformática es un área que ha modificado la forma en que se diseñan y se desarrollan los experimentos e investigaciones de las áreas biológicas. La biotecnología no ha quedado fuera de los alcances de la bioinformática, impactando directamente áreas como el descubrimiento y el desarrollo de fármacos, mejoramiento de cultivos, biorremediación, estudios de la diversidad ambiental, patología molecular, entre otras. Esto se debe, en gran medida, al desarrollo de las tecnologías de secuenciación de alto rendimiento o Next-generation sequencing (NGS), que han generado gran cantidad de datos que deben ser procesados y analizados para producir nuevos conocimientos y descubrimientos. Lo anterior ha promovido que dos áreas de la bioinformática y la ciencia de la computación, machine learning y deep learning, hayan sido utilizadas para el análisis de estos datos. El “aprendizaje de máquina” aplica técnicas que permiten que las computadoras aprendan, mientras que el “aprendizaje profundo” genera modelos de redes neuronales artificiales que intenta imitar el funcionamiento del cerebro humano, permitiéndoles aprender a partir de los datos y mejorar su aprendizaje a través de las experiencias. Estas dos áreas son esenciales para poder identificar, analizar, interpretar y obtener conocimientos de la gran cantidad de datos biológicos (Big biological data). En este trabajo hacemos una revisión de estas dos áreas: el aprendizaje de máquina y el aprendizaje profundo, orientado al impacto y sus aplicaciones en el área de biotecnología.

show abstract

Sequential regulatory activity prediction across chromosomes with convolutional neural networks

Cited by 384 publications

References 59 publications

Base-resolution models of transcription factor binding reveal soft motif syntax

Base-resolution models of transcription factor binding reveal soft motif syntax

An automated framework for efficiently designing deep convolutional neural networks in genomics

Aprendizaje de máquina y aprendizaje profundo en biotecnología: aplicaciones, impactos y desafíos

Contact Info

Product

Resources

About