Alzheimer’s disease is a neurodegenerative disorder and the most common form of dementia. Early diagnosis may assist interventions to delay onset and reduce the progression rate of the disease. We systematically reviewed the use of machine learning algorithms for predicting Alzheimer’s disease using single nucleotide polymorphisms and instances where these were combined with other types of data. We evaluated the ability of machine learning models to distinguish between controls and cases, while also assessing their implementation and potential biases. Articles published between December 2009—June 2020 were collected using Scopus, PubMed and Google Scholar. These were systematically screened for inclusion leading to a final set of 12 publications. Eighty-five percent of the included studies used the Alzheimer's Disease Neuroimaging Initiative dataset. In studies which reported area under the curve, discrimination varied (0.49–0.97). However, more than half of the included manuscripts used other forms of measurement such as accuracy, sensitivity and specificity. Model calibration statistics were also found to be reported inconsistently across all studies. The most frequent limitation in the assessed studies was sample size, with the total number of participants often numbering less than a thousand, whilst the number of predictors usually ran into the many thousands. In addition, key steps in model implementation and validation were often not performed or unreported, making it difficult to assess the capability of machine learning models.
BackgroundRecent genome‐wide studies have identified over 70 risk loci for late onset Alzheimer’s Disease (AD) (Kunkle et al. 2019, Bellenguez et el. 2021, Wightman et al. 2021). Analysis in this study focused on developing Machine Learning (ML) models to predict AD risk from genetic data. We compared the prediction accuracy of ML to polygenic risk scores (PRS) using SNPs in disease‐associated biological pathways.MethodWe used the Genetic and Environmental Risk for Alzheimer’s Disease consortium dataset (Harold et al. 2009). SNPs were selected from the AD associated pathways in Kunkle et al. 2019. Two decision tree‐based ML algorithms were used, Random Forests (RFs) and Gradient Boosting (GB). The prediction ability of these methods was compared to Polygenic Risk Score (PRS). RFs and GB models were developed using the Python library sklearn. These were trained and tested using 5‐fold cross‐validation. Clumping and thresholding (CT), as implemented in PLINK, was used to generate PRS, with logistic regression used for prediction. CT PRS was compared to the PRS generated by the PRS‐CS method (Ge, T., Chen, CY., Ni, Y. et al).ResultInitial results demonstrate that PRS, PRS‐CS and ML perform similarly for pathway specific analysis (AUC∼69%) when pathways include APOE.ConclusionSubsequent analyses will compare the performance of the methods when APOE SNPs are removed from the pathways, and in multivariate analyses modelling multiple pathway effects simultaneously.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.