Basset: Learning the regulatory code of the accessible genome with deep convolutional neural networks

Kelley, David R.; Snoek, Jasper; Rinn, John L.

doi:10.1101/028399

Cited by 163 publications

(278 citation statements)

References 58 publications

(78 reference statements)

Supporting

Mentioning

276

Contrasting

Order By: Relevance

“…In particular, we aimed to identify DNA sequences that could predict cell-type-specific effects of regulatory variants. We investigated the use of machine learning models to predict the chromatin activity of regulatory elements across our three cell types using DNA sequence only (Zhou and Troyanskaya 2015;Hashimoto et al 2016;Kelley et al 2016;Zeng et al 2016). We developed a four-layered neural network architecture, OrbWeaver, to predict cell-type-specific chromatin accessibility of 500-bp windows centered at a regulatory locus ( Fig.…”

Section: Sequence-based Model For Chromatin Activity Explains the Regmentioning

confidence: 99%

Impact of regulatory variation across human iPSCs and differentiated cells

et al. 2017

View full text Add to dashboard Cite

Induced pluripotent stem cells (iPSCs) are an essential tool for studying cellular differentiation and cell types that are otherwise difficult to access. We investigated the use of iPSCs and iPSC-derived cells to study the impact of genetic variation on gene regulation across different cell types and as models for studies of complex disease. To do so, we established a panel of iPSCs from 58 well-studied Yoruba lymphoblastoid cell lines (LCLs); 14 of these lines were further differentiated into cardiomyocytes. We characterized regulatory variation across individuals and cell types by measuring gene expression levels, chromatin accessibility, and DNA methylation. Our analysis focused on a comparison of inter-individual regulatory variation across cell types. While most cell-type-specific regulatory quantitative trait loci (QTLs) lie in chromatin that is open only in the affected cell types, we found that 20% of cell-type-specific regulatory QTLs are in shared open chromatin. This observation motivated us to develop a deep neural network to predict open chromatin regions from DNA sequence alone. Using this approach, we were able to use the sequences of segregating haplotypes to predict the effects of common SNPs on cell-type-specific chromatin accessibility.

show abstract

Section: Sequence-based Model For Chromatin Activity Explains the Regmentioning

confidence: 99%

Impact of regulatory variation across human iPSCs and differentiated cells

et al. 2017

View full text Add to dashboard Cite

show abstract

“…Although SCM differs from existing methods aimed at binary classification of hypersensitive and nonhypersensitive chromatin, we asked how SCM performance compares to four sequence-based classifiers that use either k-mer based models (gkm-svm, SeqGL) or deep learning based models (deepSEA, Basset) (Ghandi et al 2014;Setty and Leslie 2015;Zhou and Troyanskaya 2015;Kelley et al 2016). Although SCM is designed for quantitation and not binary prediction, SCM performs as well as the four state-of-the-art binary predictive methods on black-box binary prediction of functional genomic regions (Supplemental Fig.…”

Section: Wwwgenomeorgmentioning

confidence: 99%

A synergistic DNA logic predicts genome-wide chromatin accessibility

et al. 2016

View full text Add to dashboard Cite

Enhancers and promoters commonly occur in accessible chromatin characterized by depleted nucleosome contact; however, it is unclear how chromatin accessibility is governed. We show that log-additive cis-acting DNA sequence features can predict chromatin accessibility at high spatial resolution. We develop a new type of high-dimensional machine learning model, the Synergistic Chromatin Model (SCM), which when trained with DNase-seq data for a cell type is capable of predicting expected read counts of genome-wide chromatin accessibility at every base from DNA sequence alone, with the highest accuracy at hypersensitive sites shared across cell types. We confirm that a SCM accurately predicts chromatin accessibility for thousands of synthetic DNA sequences using a novel CRISPR-based method of highly efficient site-specific DNA library integration. SCMs are directly interpretable and reveal that a logic based on local, nonspecific synergistic effects, largely among pioneer TFs, is sufficient to predict a large fraction of cellular chromatin accessibility in a wide variety of cell types.

show abstract

“…These methods include DeepSEA (Zhou and Troyanskaya, 2015), DeepBind (Alipanahi et al, 2015) and Basset (Kelley et al, 2016) that 'deep learn' regulatory sequence code from big genomics data; deltaSVM (Lee et al, 2015) and deSNPs (Huang and Ovcharenko, 2015;Li and Ovcharenko, 2015) that learn sequence features from a single enhancer-associated chromatin profile and consider the k-mer content associated with the genetic variant only; CATO (Maurano et al, 2015) that predicts chromatin states by using high-throughput sequencing data across multiple individuals; C-SCORE (Kircher et al, 2014) that integrates multiple annotations into one metric by contrasting variants that survived natural selection with simulated mutations; LINSIGHT (Huang et al, 2017) that predict the likelihood of deleterious fitness consequences of mutations at noncoding nucleotide sites by combining a generalized linear model for functional genomic data with a probabilistic model of molecular evolution; and CAPE (Li et al, 2016) that decomposes the sequence code of potential-binding sites and the binding sites of cofactors from a set of chromatin profiles, and directly quantifies the deactivating effect of a single nucleotide mutation based on the corresponding change in the underlying k-mer profile.…”

Section: Introductionmentioning

confidence: 99%

SNPDelScore: combining multiple methods to score deleterious effects of noncoding mutations in the human genome

Alvarez

Landsman

et al. 2017

Bioinformatics

View full text Add to dashboard Cite

Summary: Addressing deleterious effects of noncoding mutations is an essential step towards the identification of disease-causal mutations of gene regulatory elements. Several methods for quantifying the deleteriousness of noncoding mutations using artificial intelligence, deep learning and other approaches have been recently proposed. Although the majority of the proposed methods have demonstrated excellent accuracy on different test sets, there is rarely a consensus. In addition, advanced statistical and artificial learning approaches used by these methods make it difficult porting these methods outside of the labs that have developed them. To address these challenges and to transform the methodological advances in predicting deleterious noncoding mutations into a practical resource available for the broader functional genomics and population genetics communities, we developed SNPDelScore, which uses a panel of proposed methods for quantifying deleterious effects of noncoding mutations to precompute and compare the deleteriousness scores of all common SNPs in the human genome in 44 cell lines. The panel of deleteriousness scores of a SNP computed using different methods is supplemented by functional information from the GWAS Catalog, libraries of transcription factor-binding sites, and genic characteristics of mutations. SNPDelScore comes with a genome browser capable of displaying and comparing large sets of SNPs in a genomic locus and rapidly identifying consensus SNPs with the highest deleteriousness scores making those prime candidates for phenotype-causal polymorphisms. Availability and implementation: https://www.ncbi.nlm.nih.gov/research/snpdelscore/ Contact: ovcharen@nih.gov

show abstract

Basset: Learning the regulatory code of the accessible genome with deep convolutional neural networks

Cited by 163 publications

References 58 publications

Impact of regulatory variation across human iPSCs and differentiated cells

Impact of regulatory variation across human iPSCs and differentiated cells

A synergistic DNA logic predicts genome-wide chromatin accessibility

SNPDelScore: combining multiple methods to score deleterious effects of noncoding mutations in the human genome

Contact Info

Product

Resources

About