2017
DOI: 10.1002/humu.23197
|View full text |Cite
|
Sign up to set email alerts
|

Predicting gene expression in massively parallel reporter assays: A comparative study

Abstract: In many human diseases, associated genetic changes tend to occur within non-coding regions, whose effect might be related to transcriptional control. A central goal in human genetics is to understand the function of such non-coding regions: Given a region that is statistically associated with changes in gene expression (expression Quantitative Trait Locus; eQTL), does it in fact play a regulatory role? And if so, how is this role “coded” in its sequence? These questions were the subject of the Critical Assessm… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

3
53
0

Year Published

2017
2017
2020
2020

Publication Types

Select...
5

Relationship

2
3

Authors

Journals

citations
Cited by 42 publications
(56 citation statements)
references
References 68 publications
3
53
0
Order By: Relevance
“…The distinguishing feature of the top three performing methods is that they all used DNA sequence features derived from Deep Neural Networks (DNN) trained on ENCODE data (DeepSEA or similar network methods). Thus one of the main conclusions of this study is that machine learning‐based DNA sequence features are the best predictors of mutation impact in enhancers and promoters, consistent with our previous findings (Beer, ; Inoue et al, ; Kreimer et al, ; Lee et al, ). The top three groups all did particularly well on F9 and TERT‐GBM, which we will discuss below.…”
Section: Resultssupporting
confidence: 91%
See 1 more Smart Citation
“…The distinguishing feature of the top three performing methods is that they all used DNA sequence features derived from Deep Neural Networks (DNN) trained on ENCODE data (DeepSEA or similar network methods). Thus one of the main conclusions of this study is that machine learning‐based DNA sequence features are the best predictors of mutation impact in enhancers and promoters, consistent with our previous findings (Beer, ; Inoue et al, ; Kreimer et al, ; Lee et al, ). The top three groups all did particularly well on F9 and TERT‐GBM, which we will discuss below.…”
Section: Resultssupporting
confidence: 91%
“…Blind community assessments provide the most principled way to gauge the performance of the leading computational prediction models. The 2016 Critical Assessment of Genome Interpretation (CAGI 4) eQTL challenge (Beer, ; Kreimer et al, ; Tewhey et al, ; Zeng, Edwards, Guo, & Gifford, ) assessed the effect of common human variation on the enhancer activity in lymphoblast cell lines. It established that the top performing state‐of‐the‐art models of the enhancer activity typically used machine learning methods e.g.…”
Section: Introductionmentioning
confidence: 99%
“…The same sequences were tested in LCL and HepG2 cells, thus forming the two data sets. Notably, the LCL‐eQTL data set was used as the primary source for the CAGI4 eQTL causal challenge (Kreimer et al, ). The fourth and fifth data sets (Inoue et al, ) include candidate liver enhancers, tested in either episomal or chromosomal context.…”
Section: Resultsmentioning
confidence: 99%
“…(c) HepG2‐eQTL —the same set of elements (Tewhey et al, ) as above, tested in episomal context in HepG2 cell line instead of LCL. For both data sets 2 and 3, all of the 78,738 regions were used to fit MPRAnalyze, whereas 3,044 regions corresponding to the first test group in the CAGI4 challenge (Kreimer et al, ) were used for the remaining analyses. (d) HepG2‐chr —2,236 candidate liver enhancers (Fumitaka Inoue et al, ) and 102 positive and 102 negative control sequences.…”
Section: Methodsmentioning
confidence: 99%
“…Here, we show on the eQTL challenge dataset and on previously published datasets that gkm‐SVM is indeed a reliable predictor of expression levels, in addition to variant impact. As described in more detail in the eQTL challenge overview paper (Kreimer et al., ), the eQTL challenge dataset reports expression levels in lymphoblast cell lines (LCLs) from a massively parallel reporter assay (MPRA) for both alleles of a set of 9,116 150‐bp human DNA sequences encompassing variants that had previously been identified as eQTL loci in LCLs (1000 Genomes Project Consortium et al., ; Lappalainen et al., ). Prediction groups were provided the expression levels of a subset of 3,044 pairs of alleles as a training set to train parameters of the computational prediction models.…”
Section: Introductionmentioning
confidence: 99%