2023
DOI: 10.1101/2023.11.09.563812
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

An encyclopedia of enhancer-gene regulatory interactions in the human genome

Andreas R. Gschwind,
Kristy S. Mualim,
Alireza Karbalayghareh
et al.

Abstract: Identifying transcriptional enhancers and their target genes is essential for understanding gene regulation and the impact of human genetic variation on disease1–6. Here we create and evaluate a resource of >13 million enhancer-gene regulatory interactions across 352 cell types and tissues, by integrating predictive models, measurements of chromatin state and 3D contacts, and large-scale genetic perturbations generated by the ENCODE Consortium7. We first create a systematic benchmarking pipeline to compare … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
18
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
1
1

Relationship

2
4

Authors

Journals

citations
Cited by 13 publications
(24 citation statements)
references
References 60 publications
0
18
0
Order By: Relevance
“…Because noncoding DNA sequences have highly context-specific effects and CRISPR variant editing will not be possible in many cell types in vivo in the human body, development of accurate computational models for predicting effects of variants on gene expression will be essential for complete dissection of human gene regulatory sequences. Previous work in other domains such as 3D protein structure prediction and enhancer-gene regulatory interactions has highlighted how an important step in the development of such models will be the collection of sufficiently large gold-standard datasets 82,83 . The dataset we collected here represents, to our knowledge, the largest describing the effects of isogenic sequence variants on quantitative gene expression in an endogenous genomic context, and so we explored benchmarking the performance of recent and new predictive models of variant effects.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…Because noncoding DNA sequences have highly context-specific effects and CRISPR variant editing will not be possible in many cell types in vivo in the human body, development of accurate computational models for predicting effects of variants on gene expression will be essential for complete dissection of human gene regulatory sequences. Previous work in other domains such as 3D protein structure prediction and enhancer-gene regulatory interactions has highlighted how an important step in the development of such models will be the collection of sufficiently large gold-standard datasets 82,83 . The dataset we collected here represents, to our knowledge, the largest describing the effects of isogenic sequence variants on quantitative gene expression in an endogenous genomic context, and so we explored benchmarking the performance of recent and new predictive models of variant effects.…”
Section: Discussionmentioning
confidence: 99%
“…Third, none of the models appear to correctly interpret the effects of edits to the distal enhancers without explicit external calibration, either due to the limited sequence context of the local models (for ChromBPNet) or because the long-range models do not properly learn the importance of this distal sequence (for Enformer, Fig. 4c ; for other benchmarks versus CRISPRi enhancer perturbation data, see also 83,84 ). Our results suggest an alternative route to capture long-range effects of variants, by first predicting the local effects of variants on enhancer accessibility or activity and then propagating those effects to gene expression via an enhancer-gene linking model.…”
Section: Discussionmentioning
confidence: 99%
“…We applied pgBoost and existing peak-gene linking methods to 3 scRNA/ATAC-seq multiome data sets 22,47,48 spanning 6 cell types and 85K cells. We demonstrated that pgBoost significantly outperforms constituent single-cell methods and genomic distance in enrichment for SNP-gene links derived from eQTL 33,34 , Activity-By-Contact (ABC) 17,35 , CRISPRi 15,[36][37][38][39][40][41][42] , and GWAS 43,44 data. In particular, pgBoost substantially outperformed existing methods in evaluations of longer-range links, which are of high biological importance (for example, non-coding variants in GWAS often do not regulate the closest gene [8][9][10][11] ) and are more difficult to capture using distance-based approaches (e.g.…”
Section: Discussionmentioning
confidence: 99%
“…Here, we propose an eQTL-informed gradient boosting 32 approach (pgBoost) that integrates linking scores from existing peak-gene linking methods across cell types and data sets with genomic distance, training on fine-mapped eQTL data to assign a single probabilistic score to each candidate SNP-gene link. We evaluate the performance of pgBoost and existing single-cell peak-gene linking methods by evaluating their enrichment for several sets of SNP-gene links derived from eQTL 33,34 , Activity-By-Contact (ABC) 17,35 , CRISPRi 15,[36][37][38][39][40][41][42] , and GWAS 43,44 data. We also investigate whether restricting to single-cell data from a focal cell type can improve power to detect regulatory links relevant to that cell type.…”
Section: Introductionmentioning
confidence: 99%
“…Therefore, enhancers are typically placed very close to the promoter, so enhancer-promoter communication over large genomic distances cannot be investigated (Muerdter et al 2018). Furthermore, most genes are controlled by multiple enhancers that activate the promoter together (Gschwind et al 2023). While first studies combined two enhancers to study cooperativity in their activation of a single target gene (Loubiere et al 2023; Martinez-Ara, Comoglio, and Steensel 2023), the genomic distance between the elements was negligible compared to the distances enhancers have to overcome in mammalian genomes.…”
Section: Introductionmentioning
confidence: 99%