Generalized anxiety disorder (GAD) and major depressive disorder (MDD) are highly prevalent and impairing problems, but frequently go undetected, leading to substantial treatment delays. Electronic health records (EHRs) collect a great deal of biometric markers and patient characteristics that could foster the detection of GAD and MDD in primary care settings. We approached the problem of predicting MDD and GAD using a novel machine learning pipeline to re-analyze data from an observational study. The pipeline constitutes an ensemble of algorithmically distinct machine learning methods, including deep learning. A sample of 4,184 undergraduate students completed the study, undergoing a general health screening and completing a psychiatric assessment for MDD and GAD. After explicitly excluding all psychiatric information, 59 biomedical and demographic features from the general health survey in addition to a set of engineered features were used for model training. We assessed the model's performance on a held-out test set and found an AUC of 0.73 (sensitivity: 0.66, specificity: 0.7) and 0.67 (sensitivity: 0.55, specificity: 0.7) for GAD, and MDD, respectively. Additionally, we used advanced techniques (SHAP values) to illuminate which features had the greatest impact on prediction for each disease. The top predictive features for MDD were being satisfied with living conditions and having public health insurance. The top predictive features for GAD were vaccinations being up to date and marijuana use. Our results indicate moderate predictive performance for the application of machine learning methods in detection of GAD and MDD based on EHR data. By identifying important predictors of GAD and MDD, these results may be used in future research to aid in the early detection of MDD and GAD.
Background: Generalized anxiety disorder (GAD) and major depressive disorder (MDD) are highly prevalent and impairing problems, but frequently go undetected, leading to substantial treatment delays. Electronic medical records (EMRs) collect a great deal of biometric markers and patient characteristics that could foster the detection of GAD and MDD in primary care settings. Methods: We approach the problem of predicting MDD and GAD using a novel machine learning pipeline. The pipeline constitutes an ensemble of algorithmically distinct machine learning methods, including deep learning. A sample of 4,184 undergraduate students completed the study, undergoing a general health screening and completing a psychiatric assessment for MDD and GAD. Using 59 biomedical and demographic features from the general health survey and an additional set of engineered features, we trained the model to predict GAD and MDD. Results: We assessed the model's performance on a held-out test set and found an AUC of 0.72 and 0.66 for GAD, and MDD, respectively. Additionally, we used advanced techniques (Shapley values) to illuminate which features had the greatest impact on prediction for each disease. The top predictive features for MDD were “difficulty memorizing lessons”, “financial difficulties” and “alcohol consumption”. The top predictive features for GAD were the necessity for a control examination, being overweight/obese, and irregular meal consumption. Conclusions: Our results indicate a successful application of machine learning methods in detection of GAD and MDD based on EMR-like data. By identifying biomarkers of GAD and MDD, these results may be used in future research to aid in the early detection of MDD and GAD.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.