2017
DOI: 10.1002/humu.23185
|View full text |Cite
|
Sign up to set email alerts
|

Predicting enhancer activity and variant impact using gkm‐SVM

Abstract: We participated in the Critical Assessment of Genome Interpretation eQTL challenge to further test computational models of regulatory variant impact and their association with human disease. Our prediction model is based on a discriminative gapped-kmer SVM (gkm-SVM) trained on genome-wide chromatin accessibility data in the cell type of interest. The comparisons with Massively Parallel Reporter Assays (MPRA) in lymphoblasts show that gkm-SVM is among the most accurate prediction models even though all other mo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

3
40
1

Year Published

2017
2017
2023
2023

Publication Types

Select...
5
1

Relationship

1
5

Authors

Journals

citations
Cited by 46 publications
(44 citation statements)
references
References 38 publications
3
40
1
Order By: Relevance
“…The distinguishing feature of the top three performing methods is that they all used DNA sequence features derived from Deep Neural Networks (DNN) trained on ENCODE data (DeepSEA or similar network methods). Thus one of the main conclusions of this study is that machine learning‐based DNA sequence features are the best predictors of mutation impact in enhancers and promoters, consistent with our previous findings (Beer, ; Inoue et al, ; Kreimer et al, ; Lee et al, ). The top three groups all did particularly well on F9 and TERT‐GBM, which we will discuss below.…”
Section: Resultssupporting
confidence: 91%
See 2 more Smart Citations
“…The distinguishing feature of the top three performing methods is that they all used DNA sequence features derived from Deep Neural Networks (DNN) trained on ENCODE data (DeepSEA or similar network methods). Thus one of the main conclusions of this study is that machine learning‐based DNA sequence features are the best predictors of mutation impact in enhancers and promoters, consistent with our previous findings (Beer, ; Inoue et al, ; Kreimer et al, ; Lee et al, ). The top three groups all did particularly well on F9 and TERT‐GBM, which we will discuss below.…”
Section: Resultssupporting
confidence: 91%
“…Blind community assessments provide the most principled way to gauge the performance of the leading computational prediction models. The 2016 Critical Assessment of Genome Interpretation (CAGI 4) eQTL challenge (Beer, ; Kreimer et al, ; Tewhey et al, ; Zeng, Edwards, Guo, & Gifford, ) assessed the effect of common human variation on the enhancer activity in lymphoblast cell lines. It established that the top performing state‐of‐the‐art models of the enhancer activity typically used machine learning methods e.g.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…CAGI challenges span a wide range of relationships between genetic variation and disease. For single base variants, there are challenges that address the problem of interpreting the impact of missense mutations on protein activity using a variety of molecular and cellular phenotypes, challenges that test the ability to predict the effect of mutations in cancer driver genes on cell growth, and challenges on the effect of single‐base variants on RNA expression levels and splicing (including Beer, ; Capriotti, Martelli, Fariselli, & Casadio, ; Carraro et al., ; Katsonis & Lichtarge, ; Kreimer et al., ; Niroula & Vihinen ; Pejaver et al., ; Tang et al., 2017; Tang & Fenton, ; Xu et al., ; Yin et al., ; Zeng, Edwards, Guo, & Gifford, ; Zhang et al., ). At the level of full exome and genome sequence, there are challenges that assess methods for assigning complex traits phenotypes and that evaluate the ability to associate genome sequence and an extensive profile of phenotypic traits (including Cai et al., 2017; Daneshjou et al., ; Daneshjou et al., ; Giollo et al., ; Laksshman, Bhat, Viswanath, & Li, ; Pal, Kundu, Yin, & Moult, ; Wang et al., ).…”
mentioning
confidence: 99%
“…Recently, these approaches have been used to model regulatory activity measurements from MPRAs. An SVM-based model 23 was the top performer in a challenge that benchmarked several methods for predicting MPRA activity of DNA sequences flanking regulatory genetic variants 24 . Kalita et al 25 developed a statistical model to estimate allelic imbalance at regulatory variants based on MPRA measurements.…”
Section: Introductionmentioning
confidence: 99%