The reference human genome sequence set the stage for studies of genetic variation and its association with human disease, but a similar reference has lacked for epigenomic studies. To address this need, the NIH Roadmap Epigenomics Consortium generated the largest collection to-date of human epigenomes for primary cells and tissues. Here, we describe the integrative analysis of 111 reference human epigenomes generated as part of the program, profiled for histone modification patterns, DNA accessibility, DNA methylation, and RNA expression. We establish global maps of regulatory elements, define regulatory modules of coordinated activity, and their likely activators and repressors. We show that disease and trait-associated genetic variants are enriched in tissue-specific epigenomic marks, revealing biologically-relevant cell types for diverse human traits, and providing a resource for interpreting the molecular basis of human disease. Our results demonstrate the central role of epigenomic information for understanding gene regulation, cellular differentiation, and human disease.
The pan-cancer analysis of whole genomes The expansion of whole-genome sequencing studies from individual ICGC and TCGA working groups presented the opportunity to undertake a meta-analysis of genomic features across tumour types. To achieve this, the PCAWG Consortium was established. A Technical Working Group implemented the informatics analyses by aggregating the raw sequencing data from different working groups that studied individual tumour types, aligning the sequences to the human genome and delivering a set of high-quality somatic mutation calls for downstream analysis (Extended Data Fig. 1). Given the recent meta-analysis
Histones are frequently decorated with covalent modifications. These histone modifications are thought to be involved in various chromatin-dependent processes including transcription. To elucidate the relationship between histone modifications and transcription, we derived quantitative models to predict the expression level of genes from histone modification levels. We found that histone modification levels and gene expression are very well correlated. Moreover, we show that only a small number of histone modifications are necessary to accurately predict gene expression. We show that different sets of histone modifications are necessary to predict gene expression driven by high CpG content promoters (HCPs) or low CpG content promoters (LCPs). Quantitative models involving H3K4me3 and H3K79me1 are the most predictive of the expression levels in LCPs, whereas HCPs require H3K27ac and H4K20me1. Finally, we show that the connections between histone modifications and gene expression seem to be general, as we were able to predict gene expression levels of one cell type using a model trained on another one.high CpG content promoter | low CpG content promoter | regression analysis | transcription T he DNA of eukaryotic organisms is packaged into chromatin, whose basic repeating unit is the nucleosome. A nucleosome is formed by wrapping 147 base pairs of DNA around an octamer of four core histones, H2A, H2B, H3, and H4 (1-5) which are subject to a number of posttranslational covalent modifications [(6); for review, see ref. 7]. These modifications can alter the chromatin structure and function by changing the charge of the nucleosome particle, and/or by recruiting protein complexes either individually or in combination (8). Hence, histone modifications are thought to constitute a "Histone Code," which is read out by proteins to bring about specific downstream effects (9, 10).Histone modifications have been linked to a number of chromatin-dependent processes, including replication, DNA-repair, and transcription. The link between histone modifications and transcription has been particularly intensively studied. It has been found that individual modifications can be associated with transcriptional activation or repression. Acetylation and phosphorylation generally accompany transcription; sumoylation, deimination, and proline isomerization are usually found in transcriptionally silent regions; methylation and ubiquitination are implicated in both activation and repression of transcription (8). Furthermore, the establishment of some modifications is dependent on the presence of other modifications, e.g., the catalysis of H3K4me3 requires the presence of H2BK120ub1 (the so-called trans-tail pathway) and the phosphorylation on serine 5 on the C-terminal domain of RNA polymerase II (pol II) (for review, see ref. 11, which also reviews other examples for the combinatorial action of histone modifications).Transcription proceeds in a series of steps, also referred to as transcription cycle, starting with preinitiation complex form...
Cancer is a disease potentiated by mutations in somatic cells. Cancer mutations are not distributed uniformly along the genome. Instead, different genomic regions vary by up to 5-fold in the local density of somatic mutations1, posing a fundamental problem for statistical methods of cancer genomics. Epigenomic organization has been proposed as a major determinant of the cancer mutational landscape1-5. However, both somatic mutagenesis and epigenomic features are highly cell-type-specific6,7. We investigated the distribution of mutations in multiple samples of diverse cancer types and compared them to cell-type-specific epigenomic features. Here, we show that chromatin accessibility and modification, together with replication timing, explain up to 86% of the variance in mutation rates along cancer genomes. Overwhelmingly, the best predictors of local somatic mutation density are epigenomic features derived from the most likely cell type of origin of the corresponding malignancy. Moreover, we find that cell-of-origin chromatin features are much stronger determinants of cancer mutation profiles than chromatin features of cognate cancer cell lines. We show further that the cell type of origin of a cancer can be accurately determined based on the distribution of mutations along its genome. Thus, DNA sequence of a cancer genome encompasses a wealth of information about the identity and epigenomic features of its cell of origin.
Biallelic inactivation of BRCA1 or BRCA2 is associated with a pattern of genome-wide mutations known as signature 3. By analyzing ∼1,000 breast cancer samples, we confirmed this association and established that germline nonsense and frameshift variants in PALB2, but not in ATM or CHEK2, can also give rise to the same signature. We were able to accurately classify missense BRCA1 or BRCA2 variants known to impair homologous recombination (HR) on the basis of this signature. Finally, we show that epigenetic silencing of RAD51C and BRCA1 by promoter methylation is strongly associated with signature 3 and, in our data set, was highly enriched in basal-like breast cancers in young individuals of African descent.
We here analyzed genomic features of 412 BTC samples from Japanese and Italian populations. A total of 32 significantly and commonly mutated genes were identified, some of which negatively affected patient prognosis, including a novel deletion of MUC17 at 7q22.1. Cell-of-origin predictions using WGS and epigenetic features suggest hepatocyte-origin of hepatitis-related ICCs. Deleterious germline mutations of cancer-predisposing genes were detected in 11% of patients with BTC. BTCs have distinct genetic features including somatic events and germline predisposition.
In cancer, the primary tumour's organ of origin and histopathology are the strongest determinants of its clinical behaviour, but in 3% of cases a patient presents with a metastatic tumour and no obvious primary. Here, as part of the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium, we train a deep learning classifier to predict cancer type based on patterns of somatic passenger mutations detected in whole genome sequencing (WGS) of 2606 tumours representing 24 common cancer types produced by the PCAWG Consortium. Our classifier achieves an accuracy of 91% on held-out tumor samples and 88% and 83% respectively on independent primary and metastatic samples, roughly double the accuracy of trained pathologists when presented with a metastatic tumour without knowledge of the primary. Surprisingly, adding information on driver mutations reduced accuracy. Our results have clinical applicability, underscore how patterns of somatic passenger mutations encode the state of the cell of origin, and can inform future strategies to detect the source of circulating tumour DNA.
Summary Genomic DNA replicates in a choreographed temporal order that impacts the distribution of mutations along the genome. We show here that DNA replication timing is shaped by genetic polymorphisms that act in cis upon megabase-scale DNA segments. In genome sequences from proliferating cells, read depth along chromosomes reflected DNA replication activity in those cells. We used this relationship to analyze variation in replication timing among 161 individuals sequenced by the 1000 Genomes Project. Genome-wide association of replication timing with genetic variation identified 16 loci at which inherited alleles associate with replication timing. We call these “replication timing quantitative trait loci” (rtQTLs). rtQTLs involved the differential use of replication origins, exhibited allele-specific effects on replication timing, and associated with gene expression variation at megabase scales. Our results show replication timing to be shaped by genetic polymorphism, and identify a means by which inherited polymorphism regulates the mutability of nearby sequences.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.