Machine Learning Approach for Improved Longitudinal Prediction of Progression from Mild Cognitive Impairment to Alzheimer’s Disease
Robert P. Adelson,
Anurag Garikipati,
Jenish Maharjan
et al.
Abstract:Mild cognitive impairment (MCI) is cognitive decline that can indicate future risk of Alzheimer’s disease (AD). We developed and validated a machine learning algorithm (MLA), based on a gradient-boosted tree ensemble method, to analyze phenotypic data for individuals 55–88 years old (n = 493) diagnosed with MCI. Data were analyzed within multiple prediction windows and averaged to predict progression to AD within 24–48 months. The MLA outperformed the mini-mental state examination (MMSE) and three comparison m… Show more
“…Many existing studies focused on using AI techniques to predict later stage of cognitive decline 7,9,[32][33][34] . A few studies were about NLP techniques for detecting cognitive decline.…”
Background: Early detection of cognitive decline in elderly individuals facilitates clinical trial enrollment and timely medical interventions. This study aims to apply, evaluate, and compare advanced natural language processing techniques for identifying signs of cognitive decline in clinical notes. Methods: This study, conducted at Mass General Brigham (MGB), Boston, MA, included clinical notes from the 4 years prior to initial mild cognitive impairment (MCI) diagnosis in 2019 for patients ≥ 50 years. Note sections regarding cognitive decline were labeled manually. A random sample of 4,949 note sections filtered with cognitive functions-related keywords were used for traditional AI model development, and 200 random subset were used for LLM and prompt development; another random sample of 1996 note sections without keyword filtering were used for testing. Prompt templates for large language models (LLM), Llama 2 on Amazon Web Service and GPT-4 on Microsoft Azure, were developed with multiple prompting approaches to select the optimal LLM-based method. Baseline comparisons were made with XGBoost and a hierarchical attention-based deep neural network model. An ensemble of the three models was then constructed using majority vote. Results: GPT-4 demonstrated superior accuracy and efficiency to Llama 2. The ensemble model outperformed individual models, achieving a precision of 90.3%, recall of 94.2%, and F1-score of 92.2%. Notably, the ensemble model demonstrated a marked improvement in precision (from a 70%-79% range to above 90%) compared to the best performing single model. Error analysis revealed 63 samples were wrongly predicted by at least one model; however, only 2 cases (3.2%) were mutual errors across all models, indicating diverse error profiles among them. Conclusion: Our findings indicate that LLMs and traditional models exhibit diverse error profiles. The ensemble of LLMs and locally trained machine learning models on EHR data was found to be complementary, enhancing performance and improving diagnostic accuracy.
“…Many existing studies focused on using AI techniques to predict later stage of cognitive decline 7,9,[32][33][34] . A few studies were about NLP techniques for detecting cognitive decline.…”
Background: Early detection of cognitive decline in elderly individuals facilitates clinical trial enrollment and timely medical interventions. This study aims to apply, evaluate, and compare advanced natural language processing techniques for identifying signs of cognitive decline in clinical notes. Methods: This study, conducted at Mass General Brigham (MGB), Boston, MA, included clinical notes from the 4 years prior to initial mild cognitive impairment (MCI) diagnosis in 2019 for patients ≥ 50 years. Note sections regarding cognitive decline were labeled manually. A random sample of 4,949 note sections filtered with cognitive functions-related keywords were used for traditional AI model development, and 200 random subset were used for LLM and prompt development; another random sample of 1996 note sections without keyword filtering were used for testing. Prompt templates for large language models (LLM), Llama 2 on Amazon Web Service and GPT-4 on Microsoft Azure, were developed with multiple prompting approaches to select the optimal LLM-based method. Baseline comparisons were made with XGBoost and a hierarchical attention-based deep neural network model. An ensemble of the three models was then constructed using majority vote. Results: GPT-4 demonstrated superior accuracy and efficiency to Llama 2. The ensemble model outperformed individual models, achieving a precision of 90.3%, recall of 94.2%, and F1-score of 92.2%. Notably, the ensemble model demonstrated a marked improvement in precision (from a 70%-79% range to above 90%) compared to the best performing single model. Error analysis revealed 63 samples were wrongly predicted by at least one model; however, only 2 cases (3.2%) were mutual errors across all models, indicating diverse error profiles among them. Conclusion: Our findings indicate that LLMs and traditional models exhibit diverse error profiles. The ensemble of LLMs and locally trained machine learning models on EHR data was found to be complementary, enhancing performance and improving diagnostic accuracy.
“…We think that a combination of strategies can be employed to further improve model performance with the end goal of developing practical and accessible tools that can be utilized in real-world clinical scenarios to provide additional resources for physicians, therefore enabling better health outcomes for patients. Furthermore, we think that integrating LLM capabilities with other medical AI algorithms, which already show promise in diagnostics and treatment delivery 37 , 38 , for example can provide for extremely powerful yet accessible tools that present the additional advantage of ease of implementation in healthcare workflows.…”
LLMs can accomplish specialized medical knowledge tasks, however, equitable access is hindered by the extensive fine-tuning, specialized medical data requirement, and limited access to proprietary models. Open-source (OS) medical LLMs show performance improvements and provide the transparency and compliance required in healthcare. We present OpenMedLM, a prompting platform delivering state-of-the-art (SOTA) performance for OS LLMs on medical benchmarks. We evaluated OS foundation LLMs (7B-70B) on medical benchmarks (MedQA, MedMCQA, PubMedQA, MMLU medical-subset) and selected Yi34B for developing OpenMedLM. Prompting strategies included zero-shot, few-shot, chain-of-thought, and ensemble/self-consistency voting. OpenMedLM delivered OS SOTA results on three medical LLM benchmarks, surpassing previous best-performing OS models that leveraged costly and extensive fine-tuning. OpenMedLM displays the first results to date demonstrating the ability of OS foundation models to optimize performance, absent specialized fine-tuning. The model achieved 72.6% accuracy on MedQA, outperforming the previous SOTA by 2.4%, and 81.7% accuracy on MMLU medical-subset, establishing itself as the first OS LLM to surpass 80% accuracy on this benchmark. Our results highlight medical-specific emergent properties in OS LLMs not documented elsewhere to date and validate the ability of OS models to accomplish healthcare tasks, highlighting the benefits of prompt engineering to improve performance of accessible LLMs for medical applications.
“…Machine learning can encompass various clinical data, not just MRI data. Previous research has utilised cognitive data, activity of daily living, and behavioural and psychological symptoms of dementia to differentiate between MCI and Alzheimer's disease (AD) through machine learning [13], and diverse biomarkers and clinical data have been employed to predict the prognosis of MCI through machine learning [14]. Additionally, studies using physiological data from wearable devices have shown results in predicting cognitive function in MCI [15].…”
In patients with mild cognitive impairment (MCI), a lower level of cognitive function is associated with a higher likelihood of progression to dementia. In addition, gait disturbances and structural changes on brain MRI scans reflect cognitive levels. Therefore, we aimed to classify MCI based on cognitive level using gait parameters and brain MRI data. Eighty patients diagnosed with MCI from three dementia centres in Gangwon-do, Korea, were recruited for this study. We defined MCI as a Clinical Dementia Rating global score of ≥0.5, with a memory domain score of ≥0.5. Patients were classified as early-stage or late-stage MCI based on their mini-mental status examination (MMSE) z-scores. We trained a machine learning model using gait and MRI data parameters. The convolutional neural network (CNN) resulted in the best classifier performance in separating late-stage MCI from early-stage MCI; its performance was maximised when feature patterns that included multimodal features (GAIT + white matter dataset) were used. The single support time was the strongest predictor. Machine learning that incorporated gait and white matter parameters achieved the highest accuracy in distinguishing between late-stage MCI and early-stage MCI.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.