The methylome is subject to genetic and environmental effects. Their impact may depend on sex and age, resulting in sex- and age-related physiological variation and disease susceptibility. Here we estimate the total heritability of DNA methylation levels in whole blood and estimate the variance explained by common single nucleotide polymorphisms at 411,169 sites in 2,603 individuals from twin families, to establish a catalogue of between-individual variation in DNA methylation. Heritability estimates vary across the genome (mean=19%) and interaction analyses reveal thousands of sites with sex-specific heritability as well as sites where the environmental variance increases with age. Integration with previously published data illustrates the impact of genome and environment across the lifespan at methylation sites associated with metabolic traits, smoking and ageing. These findings demonstrate that our catalogue holds valuable information on locations in the genome where methylation variation between people may reflect disease-relevant environmental exposures or genetic variation.
Background: Intra-sample cellular heterogeneity presents numerous challenges to the identification of biomarkers in large Epigenome-Wide Association Studies (EWAS). While a number of reference-based deconvolution algorithms have emerged, their potential remains underexplored and a comparative evaluation of these algorithms beyond tissues such as blood is still lacking. Results: Here we present a novel framework for reference-based inference, which leverages cell-type specific DNAse Hypersensitive Site (DHS) information from the NIH Epigenomics Roadmap to construct an improved reference DNA methylation database. We show that this leads to a marginal but statistically significant improvement of cell-count estimates in whole blood as well as in mixtures involving epithelial cell-types. Using this framework we compare a widely used state-of-the-art reference-based algorithm (called constrained projection) to two non-constrained approaches including CIBERSORT and a method based on robust partial correlations. We conclude that the widely-used constrained projection technique may not always be optimal. Instead, we find that the method based on robust partial correlations is generally more robust across a range of different tissue types and for realistic noise levels. We call the combined algorithm which uses DHS data and robust partial correlations for inference, EpiDISH (Epigenetic Dissection of Intra-Sample Heterogeneity). Finally, we demonstrate the added value of EpiDISH in an EWAS of smoking.
An outstanding challenge of Epigenome-Wide Association Studies (EWAS) performed in complex tissues is the identification of the specific cell-type(s) responsible for the observed differential DNA methylation. Here, we present a novel statistical algorithm, called CellDMC, which is able to identify not only differentially methylated positions, but also the specific cell-type(s) driving the differential methylation. We provide extensive validation of CellDMC on in-silico mixtures of DNA methylation data generated with different technologies, as well as on real mixtures from epigenome-wide-association and cancer epigenome studies. We demonstrate how CellDMC can achieve over 90% sensitivity and specificity in scenarios where current state-of-the-art methods fail to identify differential methylation. By applying CellDMC to a smoking EWAS performed in buccal swabs, we identify differentially methylated positions occurring in the epithelial compartment, which we validate in smoking-related lung cancer. CellDMC may help towards the identification of causal DNA methylation alterations in disease.
Background: Intra-sample cellular heterogeneity presents numerous challenges to the identification of biomarkers in large Epigenome-Wide Association Studies (EWAS). While a number of reference-based deconvolution algorithms have emerged, their potential remains underexplored and a comparative evaluation of these algorithms beyond tissues such as blood is still lacking.Results: Here we present a novel framework for reference-based inference, which leverages cell-type specific DNAse Hypersensitive Site (DHS) information from the NIH Epigenomics Roadmap to construct an improved reference DNA methylation database. We show that this leads to a marginal but statistically significant improvement of cell-count estimates in whole blood as well as in mixtures involving epithelial cell-types. Using this framework we compare a widely used state-of-the-art reference-based algorithm (called constrained projection) to two non-constrained approaches including CIBERSORT and a method based on robust partial correlations. We conclude that the widely-used constrained projection technique may not always be optimal. Instead, we find that the method based on robust partial correlations is generally more robust across a range of different tissue types and for realistic noise levels. We call the combined algorithm which uses DHS data and robust partial correlations for inference, EpiDISH (Epigenetic Dissection of Intra-Sample Heterogeneity).Finally, we demonstrate the added value of EpiDISH in an EWAS of smoking. Conclusions:Estimating cell-type fractions and subsequent inference in EWAS may benefit from the use of non-constrained reference-based cell-type deconvolution methods. BackgroundOne of the great challenges facing the omics community is that posed by intra-sample heterogeneity (ISH) [1]. Molecular profiles derived from complex tissues such as blood or breast represent averages over many different cell types. Since cell-type composition of complex tissues varies in response to phenotypes such as cancer or age, correcting for changes in tissue composition could be crucial if one wishes to identify potentially causal alterations in individual cell-types [2, 3]. This is particularly relevant for Epigenome-Wide Association Studies (EWAS) where the effect sizes of interest could be small compared to changes in tissue composition [3, 4]. Because of this, a number of different ISH deconvolution algorithms for genome-wide DNA methylation data have recently been proposed [5][6][7][8][9]. As with the analogous tools developed for mRNA expression data [10, 11], these algorithms can be classified as either "reference-free" [6, 8] or "reference-based" [5], depending on whether they use an a-priori database of cell-type specific DNA methylation (DNAm) reference profiles to perform the deconvolution. Although reference-free methods have the advantage that they don't require such a DNAm database and are therefore applicable, in principle, to any tissue-type, these algorithms have only been tested in blood and don't provide sample-specific absol...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.