Genome-wide methylation arrays are powerful tools for assessing cell composition of complex mixtures. We compare three approaches to select reference libraries for deconvoluting neutrophil, monocyte, B-lymphocyte, natural killer, and CD4+ and CD8+ T-cell fractions based on blood-derived DNA methylation signatures assayed using the Illumina HumanMethylationEPIC array. The IDOL algorithm identifies a library of 450 CpGs, resulting in an average R2 = 99.2 across cell types when applied to EPIC methylation data collected on artificial mixtures constructed from the above cell types. Of the 450 CpGs, 69% are unique to EPIC. This library has the potential to reduce unintended technical differences across array platforms.Electronic supplementary materialThe online version of this article (10.1186/s13059-018-1448-7) contains supplementary material, which is available to authorized users.
BackgroundConfounding due to cellular heterogeneity represents one of the foremost challenges currently facing Epigenome-Wide Association Studies (EWAS). Statistical methods leveraging the tissue-specificity of DNA methylation for deconvoluting the cellular mixture of heterogenous biospecimens offer a promising solution, however the performance of such methods depends entirely on the library of methylation markers being used for deconvolution. Here, we introduce a novel algorithm for Identifying Optimal Libraries (IDOL) that dynamically scans a candidate set of cell-specific methylation markers to find libraries that optimize the accuracy of cell fraction estimates obtained from cell mixture deconvolution.ResultsApplication of IDOL to training set consisting of samples with both whole-blood DNA methylation data (Illumina HumanMethylation450 BeadArray (HM450)) and flow cytometry measurements of cell composition revealed an optimized library comprised of 300 CpG sites. When compared existing libraries, the library identified by IDOL demonstrated significantly better overall discrimination of the entire immune cell landscape (p = 0.038), and resulted in improved discrimination of 14 out of the 15 pairs of leukocyte subtypes. Estimates of cell composition across the samples in the training set using the IDOL library were highly correlated with their respective flow cytometry measurements, with all cell-specific R2>0.99 and root mean square errors (RMSEs) ranging from [0.97 % to 1.33 %] across leukocyte subtypes. Independent validation of the optimized IDOL library using two additional HM450 data sets showed similarly strong prediction performance, with all cell-specific R2>0.90 and RMSE<4.00 %. In simulation studies, adjustments for cell composition using the IDOL library resulted in uniformly lower false positive rates compared to competing libraries, while also demonstrating an improved capacity to explain epigenome-wide variation in DNA methylation within two large publicly available HM450 data sets.ConclusionsDespite consisting of half as many CpGs compared to existing libraries for whole blood mixture deconvolution, the optimized IDOL library identified herein resulted in outstanding prediction performance across all considered data sets and demonstrated potential to improve the operating characteristics of EWAS involving adjustments for cell distribution. In addition to providing the EWAS community with an optimized library for whole blood mixture deconvolution, our work establishes a systematic and generalizable framework for the assembly of libraries that improve the accuracy of cell mixture deconvolution.Electronic supplementary materialThe online version of this article (doi:10.1186/s12859-016-0943-7) contains supplementary material, which is available to authorized users.
Purpose: Epigenetic alterations including changes to cellular DNA methylation levels contribute to carcinogenesis and may serve as powerful biomarkers of the disease. This investigation sought to determine whether hypomethylation at the long interspersed nuclear elements (LINE1), reflective of the level of global DNA methylation, in peripheral blood-derived DNA is associated with increased risk of bladder cancer.Experimental Design: LINE1 methylation was measured from blood-derived DNA obtained from participants of a population-based incident case-control study of bladder cancer in New Hampshire. Bisulfite-modified DNA was pyrosequenced to determine LINE1 methylation status; a total of 285 cases and 465 controls were evaluated for methylation.Results: Being in the lowest LINE1 methylation decile was associated with a 1.8-fold increased risk of bladder cancer [95% confidence interval (95% CI), 1.12-2.90] in models controlling for gender, age, and smoking, and the association was stronger in women than in men (odds ratio, 2.48; 95% CI, 1.19-5.17 in women; and odds ratio, 1.47; 95% CI, 0.79-2.74 in men). Among controls, women were more likely to have lower LINE1 methylation than men (P = 0.04), and levels of arsenic in the 90th percentile were associated with reduced LINE1 methylation (P = 0.04).Conclusions: LINE1 hypomethylation may be an important biomarker of bladder cancer risk, especially among women.
Purpose: The central role of microRNAs as regulators of translation has been well established, whereas the relationships between genetic variation in microRNAs and disease risk is only beginning to be explored. A polymorphism in the MIR196A2 locus has shown associations with lung, breast, esophageal, and gastric tumors but has not been examined in head and neck cancers, which share similar pathology and etiology to these diseases.Experimental Design: We studied a polymorphism in the mature sequence of MIR196A2 (rs11614913, C/T) in a population-based case-control study (n = 1,039) of head and neck squamous cell carcinoma (HNSCC) to determine if MIR196A2 genotype was associated with disease occurrence and patient survival.Results: Presence of any variant allele was associated with a significantly reduced risk for HNSCC (odds ratio, 0.8; 95% confidence interval, 0.56-0.99). Homozygous variant allele carriers with pharyngeal tumors had significantly reduced survival compared with wild-type and heterozygous cases (hazard ratio, 7.4; 95% confidence interval, 1.9-28.2). Expression analysis in a subset of tumors (n = 83) revealed no significant difference in relative expression of either miR-196a or miR-196a* by MIR196A2 genotype.Conclusion: These data demonstrate a role for MIR196A2 genotype in susceptibility and prognosis of HNSCC.
Purpose The human epigenome is profoundly altered in cancers, with a characteristic loss of methylation in repetitive regions and concomitant accumulation of gene-promoter methylation. The degree to which these processes are coordinated is unclear so we investigated both in head and neck squamous cell carcinomas. Experimental Design Global methylation was measured using the luminometric methylation assay (LUMA), and pyrosequencing of LINE-1Hs and AluYb8 repetitive elements in a series of 138 tumors. We also measured methylation of over 27,000 CpG loci with the Illumina HumanMethylation27 microarray (n=91). Results LINE-1 methylation was significantly associated with LUMA and Infinium loci methylation (Spearman’s rho=0.52/rho=0.56, both p<0.001), but not that of AluYb8. Methylation of LINE-1, AluYb8, and Infinium loci differed by tumor site (each Kruskal-Wallis p<0.05). Also, LINE-1 and LUMA methylation were associated with HPV16 E6 serology (each Mann-Whitney p<0.05). Comparing LINE-1 methylation to gene-associated methylation, we identified a distinct subset of CpG loci with significant hypermethylation associated with LINE-1 hypomethylation. An investigation of sequence features for these CpG loci revealed that they were significantly less likely to reside in repetitive elements (GSEA p<0.02), enriched in CpG islands (p<0.001), and were proximal to transcription factor binding sites (p<0.05). We validated the top CpG loci that had significant hypermethylation associated with LINE-1 hypomethylation (at EVI2A, IFRD1, KLHL6, and PTPRCAP) by pyrosequencing independent tumors. Conclusions These data indicate that global hypomethylation and gene-specific methylation processes are associated in a sequence-dependent manner, and that clinical characteristics and exposures leading to HNSCC may be influencing these processes.
DNA methylation microarrays can be employed to interrogate cell-type composition in complex tissues. Here, we expand reference-based deconvolution of blood DNA methylation to include 12 leukocyte subtypes (neutrophils, eosinophils, basophils, monocytes, naïve and memory B cells, naïve and memory CD4 + and CD8 + T cells, natural killer, and T regulatory cells). Including derived variables, our method provides 56 immune profile variables. The IDOL (IDentifying Optimal Libraries) algorithm was used to identify libraries for deconvolution of DNA methylation data for current and previous platforms. The accuracy of deconvolution estimates obtained using our enhanced libraries was validated using artificial mixtures and whole-blood DNA methylation with known cellular composition from flow cytometry. We applied our libraries to deconvolve cancer, aging, and autoimmune disease datasets. In conclusion, these libraries enable a detailed representation of immune-cell profiles in blood using only DNA and facilitate a standardized, thorough investigation of immune profiles in human health and disease.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.