The reference human genome sequence set the stage for studies of genetic variation and its association with human disease, but a similar reference has lacked for epigenomic studies. To address this need, the NIH Roadmap Epigenomics Consortium generated the largest collection to-date of human epigenomes for primary cells and tissues. Here, we describe the integrative analysis of 111 reference human epigenomes generated as part of the program, profiled for histone modification patterns, DNA accessibility, DNA methylation, and RNA expression. We establish global maps of regulatory elements, define regulatory modules of coordinated activity, and their likely activators and repressors. We show that disease and trait-associated genetic variants are enriched in tissue-specific epigenomic marks, revealing biologically-relevant cell types for diverse human traits, and providing a resource for interpreting the molecular basis of human disease. Our results demonstrate the central role of epigenomic information for understanding gene regulation, cellular differentiation, and human disease.
Summary Cancer progression depends on both cell-intrinsic processes and interactions between different cell types. However, large scale assessment of cell type composition and molecular profiles of individual cell types within tumors remains challenging. To address this, we developed Epigenomic Deconvolution (EDec), an in silico method that infers cell type composition of complex tissues as well as DNA methylation and gene transcription profiles of constituent cell types. By applying EDec to The Cancer Genome Atlas (TCGA) breast tumors we detect changes in immune cell infiltration related to patient prognosis, and a striking change in stromal fibroblast to adipocyte ratio across breast cancer subtypes. We further show that a less adipose stroma tends to display lower levels of mitochondrial activity and to be associated with cancerous cells with higher levels of oxidative metabolism. These findings highlight the role of stromal composition in the metabolic coupling between distinct cell types within tumors.
To assess the impact of genetic variation in regulatory loci on human health, we construct a high-resolution map of allelic imbalances in DNA methylation, histone marks, and gene transcription in 71 epigenomes from 36 distinct cell and tissue types from 13 donors. Deep whole-genome bisulfite sequencing of 49 methylomes reveals sequence-dependent CpG methylation imbalances at thousands of heterozygous regulatory loci. Such loci are enriched for stochastic switching, defined as random transitions between fully methylated and unmethylated states of DNA. The methylation imbalances at thousands of loci are explainable by different relative frequencies of the methylated and unmethylated states for the two alleles. Further analyses provide a unifying model that links sequence-dependent allelic imbalances of the epigenome, stochastic switching at gene regulatory loci, and disease-associated
Tissue-specific expression of lincRNAs suggests developmental and cell-type-specific functions, yet tissue specificity was established for only a small fraction of lincRNAs. Here, by analysing 111 reference epigenomes from the NIH Roadmap Epigenomics project, we determine tissue-specific epigenetic regulation for 3,753 (69% examined) lincRNAs, with 54% active in one of the 14 cell/tissue clusters and an additional 15% in two or three clusters. A larger fraction of lincRNA TSSs is marked in a tissue-specific manner by H3K4me1 than by H3K4me3. The tissue-specific lincRNAs are strongly linked to tissue-specific pathways and undergo distinct chromatin state transitions during cellular differentiation. Polycomb-regulated lincRNAs reside in the bivalent state in embryonic stem cells and many of them undergo H3K27me3-mediated silencing at early stages of differentiation. The exquisitely tissue-specific epigenetic regulation of lincRNAs and the assignment of a majority of them to specific tissue types will inform future studies of this newly discovered class of genes.
Background Reproducible detection of inherited variants with whole genome sequencing (WGS) is vital for the implementation of precision medicine and is a complicated process in which each step affects variant call quality. Systematically assessing reproducibility of inherited variants with WGS and impact of each step in the process is needed for understanding and improving quality of inherited variants from WGS. Results To dissect the impact of factors involved in detection of inherited variants with WGS, we sequence triplicates of eight DNA samples representing two populations on three short-read sequencing platforms using three library kits in six labs and call variants with 56 combinations of aligners and callers. We find that bioinformatics pipelines (callers and aligners) have a larger impact on variant reproducibility than WGS platform or library preparation. Single-nucleotide variants (SNVs), particularly outside difficult-to-map regions, are more reproducible than small insertions and deletions (indels), which are least reproducible when > 5 bp. Increasing sequencing coverage improves indel reproducibility but has limited impact on SNVs above 30×. Conclusions Our findings highlight sources of variability in variant detection and the need for improvement of bioinformatics pipelines in the era of precision medicine with WGS.
The abundance of Lp(a) protein holds significant implications for the risk of cardiovascular disease (CVD), which is directly impacted by the copy number (CN) of KIV-2, a 5.5 kbp sub-region. KIV-2 is highly polymorphic in the population and accurate analysis is challenging. In this study, we present the DRAGEN KIV-2 CN caller, which utilizes short reads. Data across 166 WGS show that the caller has high accuracy, compared to optical mapping and can further phase ~50% of the samples. We compared KIV-2 CN numbers to 24 previously postulated KIV-2 relevant SNVs, revealing that many are ineffective predictors of KIV-2 copy number. Population studies, including USA-based cohorts, showed distinct KIV-2 CN, distributions for European-, African-, and Hispanic-American populations and further underscored the limitations of SNV predictors. We demonstrate that the CN estimates correlate significantly with the available Lp(a) protein levels and that phasing is highly important.
In recent years, immune checkpoint inhibitors have shown great promise in treating various cancer types; however, only a fraction of patients respond to this type of immunotherapy. Currently, PD-L1 ligand expression and microsatellite-instability (MSI) status are FDA-approved biomarkers to guide selected checkpoint-based immunotherapy; however, due to the complexity of tumor-immune interactions, it is unlikely that any single biomarker will be able to comprehensively predict clinical outcomes across the gamut of immunotherapeutics. Many genomic and cellular features have been shown to contribute to the effectiveness of immunotherapy, including tumor mutational burden (TMB), tumor T cell infiltrate, HLA gene expression, and Treg /myeloid-derived suppressor cell (MDSC) infiltrates. Here we have built a machine learning model trained on multiple features derived from whole exome sequencing (WES) and whole transcriptome sequencing (WTS) data. The resulting patient-specific “immunoscore” describes the likelihood that a patient will respond to a specified immunotherapy. Additionally, we have developed a novel visualization schema, which summarizes the full model immunoscore, as well as the weight and impact of each genomic and transcriptomic feature. In this study, we assessed WES and WTS data from two melanoma cohorts, the first treated with anti-PD1 immunotherapy and the second with anti-CTLA4 immunotherapy. First, we extracted genomic/transcriptomic features around five functional groups: antigen presentation, tumor lymphocyte infiltration, checkpoint gene signatures, interferon-gamma gene signatures, and Treg/MDSC gene signatures. Next, random forest classification was performed to identify significant features and weight the relative importance of each. A final immunoscore was calculated as the patient-specific probability of immunotherapy response, scaled from zero (0% likelihood of response) to ten (100% likelihood of response). We noted that the highest-weighted feature for anti-PD1 response came from the antigen presentation feature group, while the highest-weighted feature for anti-CTLA4 response came from the tumor lymphocyte infiltration feature group, which is consistent with the underlying mechanistic difference between the two checkpoint inhibitors. Finally, our immunoscore has shown significantly better performance compared to any single feature based on 3-fold cross validation (p<0.05). In summary, we have built a machine learning model to predict patient responses to immunotherapies based on WES and WTS data and show that integrating multiple established biomarkers delivers superior performance compared to any individual biomarker. Moreover, our framework can be extended to include novel genomic/transcriptomic features that are identified as mediating immunotherapy response. Citation Format: Mengchi Wang, Aaron Wise, Han Kang, Vitor F. Onuchic, Ali Kuraishy, Sven Bilke, Kristina M. Kruglyak, Shile Zhang. A comprehensive immunoscore to predict immunotherapy responses based on multivariate genomic/transcriptomic features [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2018; 2018 Apr 14-18; Chicago, IL. Philadelphia (PA): AACR; Cancer Res 2018;78(13 Suppl):Abstract nr 569.
No abstract
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.