Patients with Parkinson’s Disease (PD) often suffer from cognitive decline. Accurate prediction of cognitive decline is essential for early treatment of at-risk patients. The aim of this study was to develop and evaluate a multimodal machine learning model for the prediction of continuous cognitive decline in patients with early PD. We included 213 PD patients from the Parkinson’s Progression Markers Initiative (PPMI) database. Machine learning was used to predict change in Montreal Cognitive Assessment (MoCA) score using the difference between baseline and 4-years follow-up data as outcome. Input features were categorized into four sets: clinical test scores, cerebrospinal fluid (CSF) biomarkers, brain volumes, and genetic variants. All combinations of input feature sets were added to a basic model, which consisted of demographics and baseline cognition. An iterative scheme using RReliefF-based feature ranking and support vector regression in combination with tenfold cross validation was used to determine the optimal number of predictive features and to evaluate model performance for each combination of input feature sets. Our best performing model consisted of a combination of the basic model, clinical test scores and CSF-based biomarkers. This model had 12 features, which included baseline cognition, CSF phosphorylated tau, CSF total tau, CSF amyloid-beta1-42, geriatric depression scale (GDS) scores, and anxiety scores. Interestingly, many of the predictive features in our model have previously been associated with Alzheimer’s disease, showing the importance of assessing Alzheimer’s disease pathology in patients with Parkinson’s disease.
Parkinson's disease (PD) is the second most common neurodegenerative disease affecting 2-3% of the population over 65 years of age. Considerable research has investigated the benefit of using neuroimaging to improve PD diagnosis. However, it is challenging for medical experts to manually identify the subtle differences associated with PD in such complex data. It has been shown that machine learning models can achieve human-like accuracies for many computer-aided diagnosis applications. However, model performance usually depends on the amount and diversity of training data available, whereas most Parkinson's disease classification models were trained on rather small datasets. Training data size and diversity can be increased by curating multi-site datasets. However, this may also increase biological and non-biological variances due to differences in participant cohorts, scanners, and data acquisition protocols. Thus, data harmonization is important to reduce those variances and enable the models to focus primarily on the patterns associated with PD. This work compares intensity harmonization techniques on 1796 MRI scans from twelve studies. Our results show that a histogram matching approach does not improve classification accuracy (78%) compared to the model trained on unharmonized data (baseline). However, it reduces the disparity between sensitivity and specificity from 81% and 73% to 77% and 79%, respectively. Moreover, combining histogram matching and least squares mean tissue intensity harmonization methods outperform the baseline model (accuracy of 74% compared to 67%) for an independent test set. Finally, our analysis considering sex (male, female) and groups (PD, healthy) shows that models trained on harmonized data exhibited reduced performance disparities between groups, which may be interpreted as a form of bias mitigation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.