Detection and diagnosis of early and subclinical stages of Alzheimer's Disease (AD) play an essential role in the implementation of intervention and prevention strategies. Neuroimaging techniques predominantly provide insight into anatomic structure changes associated with AD. Deep learning methods have been extensively applied towards creating and evaluating models capable of differentiating between cognitively unimpaired, patients with Mild Cognitive Impairment (MCI) and AD dementia. Several published approaches apply information fusion techniques, providing ways of combining several input sources in the medical domain, which contributes to knowledge of broader and enriched quality. The aim of this paper is to fuse sociodemographic data such as age, marital status, education and gender, and genetic data (presence of an apolipoprotein E (APOE)-ε4 allele) with Magnetic Resonance Imaging (MRI) scans. This enables enriched multi-modal features, that adequately represent the MRI scan visually and is adopted for creating and modeling classification systems capable of detecting amnestic MCI (aMCI). To fully utilize the potential of deep convolutional neural networks, two extra color layers denoting contrast intensified and blurred image adaptations are virtually augmented to each MRI scan, completing the Red-Green-Blue (RGB) color channels. Deep convolutional activation features (DeCAF) are extracted from the average pooling layer of the deep learning system Inception_v3. These features from the fused MRI scans are used as visual representation for the Long Short-Term Memory (LSTM) based Recurrent Neural Network (RNN) classification model. The proposed approach is evaluated on a sub-study containing 120 participants (aMCI = 61 and cognitively unimpaired = 59) of the Heinz Nixdorf Recall (HNR) Study with a baseline model
Background
For the recruitment and monitoring of subjects for therapy studies, it is important to predict whether mild cognitive impaired (MCI) subjects will prospectively develop Alzheimer’s disease (AD). Machine learning (ML) is suitable to improve early AD prediction. The etiology of AD is heterogeneous, which leads to high variability in disease patterns. Further variability originates from multicentric study designs, varying acquisition protocols, and errors in the preprocessing of magnetic resonance imaging (MRI) scans. The high variability makes the differentiation between signal and noise difficult and may lead to overfitting. This article examines whether an automatic and fair data valuation method based on Shapley values can identify the most informative subjects to improve ML classification.
Methods
An ML workflow was developed and trained for a subset of the Alzheimer’s Disease Neuroimaging Initiative (ADNI) cohort. The validation was executed for an independent ADNI test set and for the Australian Imaging, Biomarker and Lifestyle Flagship Study of Ageing (AIBL) cohort. The workflow included volumetric MRI feature extraction, feature selection, sample selection using Data Shapley, random forest (RF), and eXtreme Gradient Boosting (XGBoost) for model training as well as Kernel SHapley Additive exPlanations (SHAP) values for model interpretation.
Results
The RF models, which excluded 134 of the 467 training subjects based on their RF Data Shapley values, outperformed the base models that reached a mean accuracy of 62.64% by 5.76% (3.61 percentage points) for the independent ADNI test set. The XGBoost base models reached a mean accuracy of 60.00% for the AIBL data set. The exclusion of those 133 subjects with the smallest RF Data Shapley values could improve the classification accuracy by 2.98% (1.79 percentage points). The cutoff values were calculated using an independent validation set.
Conclusion
The Data Shapley method was able to improve the mean accuracies for the test sets. The most informative subjects were associated with the number of ApolipoproteinE ε4 (ApoE ε4) alleles, cognitive test results, and volumetric MRI measurements.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.