Bipolar disorders (BDs) are among the leading causes of morbidity and disability. Objective biological markers, such as those based on brain imaging, could aid in clinical management of BD. Machine learning (ML) brings neuroimaging analyses to individual subject level and may potentially allow for their diagnostic use. However, fair and optimal application of ML requires large, multi-site datasets. We applied ML (support vector machines) to MRI data (regional cortical thickness, surface area, subcortical volumes) from 853 BD and 2167 control participants from 13 cohorts in the ENIGMA consortium. We attempted to differentiate BD from control participants, investigated different data handling strategies and studied the neuroimaging/clinical features most important for classification. Individual site accuracies ranged from 45.23% to 81.07%. Aggregate subject-level analyses yielded the highest accuracy (65.23%, 95% CI = 63.47-67.00, ROC-AUC = 71.49%, 95% CI = 69.39-73.59), followed by leave-one-site-out cross-validation (accuracy = 58.67%, 95% CI = 56.70-60.63). Meta-analysis of individual site accuracies did not provide above chance results. There was substantial agreement between the regions that contributed to identification of BD participants in the best performing site and in the aggregate dataset (Cohen's Kappa = 0.83, 95% CI = 0.829-0.831). Treatment with anticonvulsants and age were associated with greater odds of correct classification. Although short of the 80% clinically relevant accuracy threshold, the results are promising and provide a fair and realistic estimate of classification performance, which can be achieved in a large, ecologically valid, multi-site sample of BD participants based on regional neurostructural measures. Furthermore, the significant classification in different samples was based on plausible and similar neuroanatomical features. Future multi-site studies should move towards sharing of raw/voxelwise neuroimaging data.
Identifying neurobiological differences between patients with major depressive disorder (MDD) and healthy individuals has been a mainstay of clinical neuroscience for decades. However, recent meta-analyses have raised concerns regarding the replicability and clinical relevance of brain alterations in depression.OBJECTIVE To quantify the upper bounds of univariate effect sizes, estimated predictive utility, and distributional dissimilarity of healthy individuals and those with depression across structural magnetic resonance imaging (MRI), diffusion-tensor imaging, and functional task-based as well as resting-state MRI, and to compare results with an MDD polygenic risk score (PRS) and environmental variables. DESIGN, SETTING, AND PARTICIPANTSThis was a cross-sectional, case-control clinical neuroimaging study. Data were part of the Marburg-Münster Affective Disorders Cohort Study. Patients with depression and healthy controls were recruited from primary care and the general population in Münster and Marburg, Germany. Study recruitment was performed from September 11, 2014, to September 26, 2018. The sample comprised patients with acute and chronic MDD as well as healthy controls in the age range of 18 to 65 years. Data were analyzed from October 29, 2020, to April 7, 2022.MAIN OUTCOMES AND MEASURES Primary analyses included univariate partial effect size (η 2 ), classification accuracy, and distributional overlapping coefficient for healthy individuals and those with depression across neuroimaging modalities, controlling for age, sex, and additional modality-specific confounding variables. Secondary analyses included patient subgroups for acute or chronic depressive status.RESULTS A total of 1809 individuals (861 patients [47.6%] and 948 controls [52.4%]) were included in the analysis (mean [SD] age, 35.6 [13.2] years; 1165 female patients [64.4%]). The upper bound of the effect sizes of the single univariate measures displaying the largest group difference ranged from partial η 2 of 0.004 to 0.017, and distributions overlapped between 87% and 95%, with classification accuracies ranging between 54% and 56% across neuroimaging modalities. This pattern remained virtually unchanged when considering either only patients with acute or chronic depression. Differences were comparable with those found for PRS but substantially smaller than for environmental variables.CONCLUSIONS AND RELEVANCE Results of this case-control study suggest that even for maximum univariate biological differences, deviations between patients with MDD and healthy controls were remarkably small, single-participant prediction was not possible, and similarity between study groups dominated. Biological psychiatry should facilitate meaningful outcome measures or predictive approaches to increase the potential for a personalization of the clinical practice.
We currently observe a disconcerting phenomenon in machine learning studies in psychiatry: While we would expect larger samples to yield better results due to the availability of more data, larger machine learning studies consistently show much weaker performance than the numerous small-scale studies. Here, we systematically investigated this effect focusing on one of the most heavily studied questions in the field, namely the classification of patients suffering from Major Depressive Disorder (MDD) and healthy controls based on neuroimaging data. Drawing upon structural MRI data from a balanced sample of N = 1868 MDD patients and healthy controls from our recent international Predictive Analytics Competition (PAC), we first trained and tested a classification model on the full dataset which yielded an accuracy of 61%. Next, we mimicked the process by which researchers would draw samples of various sizes (N = 4 to N = 150) from the population and showed a strong risk of misestimation. Specifically, for small sample sizes (N = 20), we observe accuracies of up to 95%. For medium sample sizes (N = 100) accuracies up to 75% were found. Importantly, further investigation showed that sufficiently large test sets effectively protect against performance misestimation whereas larger datasets per se do not. While these results question the validity of a substantial part of the current literature, we outline the relatively low-cost remedy of larger test sets, which is readily available in most cases.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.