Protein biomarkers have been identified across many age-related morbidities. However, characterising epigenetic influences could further inform disease predictions. Here, we leverage epigenome-wide data to study links between the DNAm signatures of the circulating proteome and incident diseases. Using data from four cohorts, we trained and tested epigenetic scores (EpiScores) for 953 plasma proteins, identifying 109 scores that explained between 1% and 58% of the variance in protein levels after adjusting for known protein quantitative trait loci (pQTL) genetic effects. By projecting these EpiScores into an independent sample, (Generation Scotland; n=9,537) and relating them to incident morbidities over a follow-up of 14 years, we uncovered 137 EpiScore – disease associations. These associations were largely independent of immune cell proportions, common lifestyle and health factors and biological aging. Notably, we found that our diabetes-associated EpiScores highlighted previous top biomarker associations from proteome-wide assessments of diabetes. These EpiScores for protein levels can therefore be a valuable resource for disease prediction and risk stratification.
Background DNA methylation is a dynamic epigenetic mechanism that occurs at cytosine-phosphate-guanine dinucleotide (CpG) sites. Epigenome-wide association studies (EWAS) investigate the strength of association between methylation at individual CpG sites and health outcomes. Although blood methylation may act as a peripheral marker of common disease states, previous EWAS have typically focused only on individual conditions and have had limited power to discover disease-associated loci. This study examined the association of blood DNA methylation with the prevalence of 14 disease states and the incidence of 19 disease states in a single population of over 18,000 Scottish individuals. Methods and findings DNA methylation was assayed at 752,722 CpG sites in whole-blood samples from 18,413 volunteers in the family-structured, population-based cohort study Generation Scotland (age range 18 to 99 years). EWAS tested for cross-sectional associations between baseline CpG methylation and 14 prevalent disease states, and for longitudinal associations between baseline CpG methylation and 19 incident disease states. Prevalent cases were self-reported on health questionnaires at the baseline. Incident cases were identified using linkage to Scottish primary (Read 2) and secondary (ICD-10) care records, and the censoring date was set to October 2020. The mean time-to-diagnosis ranged from 5.0 years (for chronic pain) to 11.7 years (for Coronavirus Disease 2019 (COVID-19) hospitalisation). The 19 disease states considered in this study were selected if they were present on the World Health Organisation’s 10 leading causes of death and disease burden or included in baseline self-report questionnaires. EWAS models were adjusted for age at methylation typing, sex, estimated white blood cell composition, population structure, and 5 common lifestyle risk factors. A structured literature review was also conducted to identify existing EWAS for all 19 disease states tested. The MEDLINE, Embase, Web of Science, and preprint servers were searched to retrieve relevant articles indexed as of March 27, 2023. Fifty-four of approximately 2,000 indexed articles met our inclusion criteria: assayed blood-based DNA methylation, had >20 individuals in each comparison group, and examined one of the 19 conditions considered. First, we assessed whether the associations identified in our study were reported in previous studies. We identified 69 associations between CpGs and the prevalence of 4 conditions, of which 58 were newly described. The conditions were breast cancer, chronic kidney disease, ischemic heart disease, and type 2 diabetes mellitus. We also uncovered 64 CpGs that associated with the incidence of 2 disease states (COPD and type 2 diabetes), of which 56 were not reported in the surveyed literature. Second, we assessed replication across existing studies, which was defined as the reporting of at least 1 common site in >2 studies that examined the same condition. Only 6/19 disease states had evidence of such replication. The limitations of this study include the nonconsideration of medication data and a potential lack of generalizability to individuals that are not of Scottish and European ancestry. Conclusions We discovered over 100 associations between blood methylation sites and common disease states, independently of major confounding risk factors, and a need for greater standardisation among EWAS on human disease.
Type 2 diabetes mellitus (T2D) is one of the most prevalent diseases in the world and presents a major health and economic burden, a notable proportion of which could be alleviated with improved early prediction and intervention. While standard risk factors including age, obesity, and hypertension have shown good predictive performance, we show that the use of CpG DNA methylation information leads to a significant improvement in the prediction of 10-year T2D incidence risk.Whilst previous studies have been largely constrained by linear assumptions and the use of CpGs one-at-the-time, we have adopted a more flexible approach based on a range of linear and tree-ensemble models for classification and time-to-event prediction. Using the Generation Scotland cohort (n=9,537) our best performing model (Area Under the Curve (AUC)=0.880, Precision Recall AUC (PRAUC)=0.539, McFadden’s R2=0.316) used a LASSO Cox proportional-hazards predictor and showed notable improvement in onset prediction, above and beyond standard risk factors (AUC=0.860, PRAUC=0.444 R2=0.261). Replication of the main finding was observed in an external test dataset (the German-based KORA study, p=3.7×10−4). Tree-ensemble methods provided comparable performance and future improvements to these models are discussed.Finally, we introduce MethylPipeR, an R package with accompanying user interface, for systematic and reproducible development of complex trait and incident disease predictors. While MethylPipeR was applied to incident T2D prediction with DNA methylation in our experiments, the package is designed for generalised development of predictive models and is applicable to a wide range of omics data and target traits.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.