2022
DOI: 10.1038/s41467-022-34152-5
|View full text |Cite
|
Sign up to set email alerts
|

Learning the histone codes with large genomic windows and three-dimensional chromatin interactions using transformer

Abstract: The quantitative characterization of the transcriptional control by histone modifications has been challenged by many computational studies, but most of them only focus on narrow and linear genomic regions around promoters, leaving a room for improvement. We present Chromoformer, a transformer-based, three-dimensional chromatin conformation-aware deep learning architecture that achieves the state-of-the-art performance in the quantitative deciphering of the histone codes in gene regulation. The core essence of… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
19
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 22 publications
(32 citation statements)
references
References 49 publications
0
19
0
Order By: Relevance
“…Another potential limitation of SpliceBERT is that it could only process sequences no longer than 1024nt because the memory complexity of vanilla Transformer layers is quadratic to the sequence length. However, in many tasks in genomics, including splice site prediction, integrating large genomic context from thousands to millions of nucleotides is crucial to ensure the accuracy of predictions (9,(46)(47)(48). One straightforward solution is to adopt Transformers with linear memory complexity with respect to sequence length.…”
Section: Discussionmentioning
confidence: 99%
“…Another potential limitation of SpliceBERT is that it could only process sequences no longer than 1024nt because the memory complexity of vanilla Transformer layers is quadratic to the sequence length. However, in many tasks in genomics, including splice site prediction, integrating large genomic context from thousands to millions of nucleotides is crucial to ensure the accuracy of predictions (9,(46)(47)(48). One straightforward solution is to adopt Transformers with linear memory complexity with respect to sequence length.…”
Section: Discussionmentioning
confidence: 99%
“…Since MYC is in both in-and cross-cell type test sets, we believe that CREaTor has learned general rules guiding cCRE-gene interactions in different cell types, rendering it an efficient tool for cCRE activity modeling in unseen cell types CREaTor captures chromatin domain boundaries in unseen cell types. Three-dimensional (3D) chromatin folding allows physical interactions between distal cCRE and genes and the information can also guide gene regulation modeling 13,28,42 . Without incorporating 3D chromatin folding information in our model, we were curious to see if CREaTor captured the topological structure of the genome, considering that CREaTor precisely recovers cCRE-gene interactions even of long ranges.…”
Section: Predictions Based On Ccre H3k27ac Signals and Ccre-gene Dist...mentioning
confidence: 99%
“…Trained in a self-supervised manner on vast amounts of genomic data, these models offer a comprehensive understanding of the sequence language and hold the potential to enable various downstream applications. Genomic foundation models have achieved encouraging results in promoter (Ji et al, 2021;Dalla-Torre et al, 2023;Zhou et al, 2023) and enhancer (Dalla-Torre et al, 2023) prediction, chromatin state analysis (Lee et al, 2022), transcription factor binding sites prediction (Ji et al, 2021;Dalla-Torre et al, 2023;Zhou et al, 2023), and functional variants prioritization (Ji et al, 2021;Dalla-Torre et al, 2023). The DNABERT model (Ji et al, 2021) was one of the first adaptations of large language models to the human genome.…”
Section: Introductionmentioning
confidence: 99%