Pattern recognition predictive models have become an important tool for analysis of neuroimaging data and answering important questions from clinical and cognitive neuroscience. Regardless of the application, the most commonly used method to quantify model performance is to calculate prediction accuracy, i.e. the proportion of correctly classified samples. While simple and intuitive, other performance measures are often more appropriate with respect to many common goals of neuroimaging pattern recognition studies. In this paper, we will review alternative performance measures and focus on their interpretation and practical aspects of model evaluation. Specifically, we will focus on 4 families of performance measures: 1) categorical performance measures such as accuracy, 2) rank based performance measures such as the area under the curve, 3) probabilistic performance measures based on quadratic error such as Brier score, and 4) probabilistic performance measures based on information criteria such as logarithmic score. We will examine their statistical properties in various settings using simulated data and real neuroimaging data derived from public datasets. Results showed that accuracy had the worst performance with respect to statistical power, detecting model improvement, selecting informative features and reliability of results. Therefore in most cases, it should not be used to make statistical inference about model performance. Accuracy should also be avoided for evaluating utility of clinical models, because it does not take into account clinically relevant information, such as relative cost of false-positive and false-negative misclassification or calibration of probabilistic predictions. We recommend alternative evaluation criteria with respect to the goals of a specific machine learning model.
Our group developed a transcriptome-based polygenic risk score (T-PRS) that uses common genetic variants to capture "depression-like" shifts in cortical gene expression. Here, we mapped T-PRS onto diagnosis and symptom severity in major depressive disorder (MDD) cases and controls from the Psychiatric Genomics Consortium (PGC). To evaluate potential mechanisms, we further mapped T-PRS onto discrete measures of brain morphology and broad depression risk in healthy young adults. Genetic, self-report, and/or neuroimaging data were available in 29,340 PGC participants (59% women; 12,923 MDD cases, 16,417 controls) and 482 participants in the Duke Neurogenetics Study (DNS: 53% women; aged 19.8 +/- 1.2 years). T-PRS was computed from SNP data using PrediXcan to impute cortical expression levels of MDD-related genes from a previous post-mortem transcriptome meta-analysis. Sex-specific regressions were used to test effects of T-PRS on depression diagnosis, symptom severity, and Freesurfer-derived subcortical volume, cortical thickness, surface area, and local gyrification index in the PGC and DNS samples, respectively. T-PRS did not predict depression diagnosis (OR=1.007, 95%CI=[0.997-1.018]); however, it correlated with symptom severity in men (rho=0.175, p=7.957x10-4) in one large PGC cohort (N=762, 48% men). In DNS, T-PRS was associated with smaller amygdala volume in women (β=-0.186, t=-3.478, p=.001) and less prefrontal gyrification (max≤-2.970, p≤.006) in both sexes. In men, prefrontal gyrification mediated an indirect effect of T-PRS on broad depression risk (b=.005, p=.029), indexed using self-reported family history of depression. Depression-like shifts in cortical gene expression predict symptom severity in men and may contribute to disease vulnerability through their effect on cortical gyrification.
Most smartphones and wearables are nowadays equipped with location sensing (using Global Positioning System and mobile network information) that enable continuous location tracking of their users. Several studies have reported that the amount of time an individual experiencing symptoms of Major Depressive Disorder (MDD) spends at home a day (i.e., home stay), as well as various mobility related metrics, are associated with symptom severity in MDD. Due to the use of small and homogeneous cohorts of participants, it is uncertain whether the findings reported in those studies generalize to a broader population of individuals with the MDD symptoms. In the present study, we examined the relationship between overall severity of the depressive symptoms, as assessed by the eight-item Patient Health Questionnaire (PHQ – 8), and median daily home stay over the two weeks preceding the completion of a questionnaire in individuals with MDD. We used questionnaire and geolocation data of 164 participants collected in the observational Remote Assessment of Disease and Relapse – Major Depressive Disorder (RADAR – MDD) study. Participant age and severity of the MDD symptoms were found to be significantly related to home stay, with older and more severely affected individuals spending more time at home. The association between home stay and symptom severity appeared to be stronger on weekdays than on weekends. Furthermore, we found a significant modulation of home stay by occupational status, with employment reducing home stay. Our findings suggest that home stay is associated with symptom severity in MDD and demonstrate the importance of accounting for confounding factors in future MDD studies.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.