Evidence of Inflated Prediction Performance: A Commentary on Machine Learning and Suicide Research

Jacobucci, Ross; Littlefield, Andrew K.; Millner, Alexander J.; Kleiman, Evan M.; Steinley, Douglas

doi:10.1177/2167702620954216

Cited by 48 publications

(51 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Note that while machine learning approaches are valuable and may increase predictive validity, traditional hypothesis testing is also needed to provide an explanation of associations among mHealth variables of interest and mental health. This may be particularly important as recent research indicates that machine learning approaches often inflate predictive performance in mental health research (Jacobucci et al, 2021), although this is not found in all studies (Jacobson et al, 2021). The present study used a novel multimethod approach with wearable indices of biobehavioral functioning as cross-sectional and prospective predictors of adolescent internalizing symptoms across early adolescence.…”

Section: Introductionmentioning

confidence: 96%

Concurrent and Prospective Associations Between Fitbit Derived RDoC Arousal and Regulation Constructs and Adolescent Internalizing Symptoms

Nelson¹,

Flannery²,

Duell³

et al. 2021

Preprint

View full text Add to dashboard Cite

Background: Adolescence is characterized by alterations in biobehavioral functioning, during which individuals are at heightened risk for first onset of psychopathology, particularly internalizing disorders (e.g., depression and anxiety). Recently, researchers have proposed the use of mobile Health (mHealth) technologies to passively index biobehavioral functioning in everyday life yet, there is a dearth of research examining how wearable metrics, which map onto NIMH Research Domain Criteria (RDoC) of Arousal and Regulation, are associated with concurrent and prospective changes in mental health.Methods: We preregistered secondary data analyses using the Adolescent Brain Cognitive Development (ABCD) Study dataset to determine whether wearable indices of resting heart rate (HR), step count, and sleep duration as well as variability in these measures were cross-sectionally associated with internalizing symptomatology in 5,686 adolescents. All models were also run controlling for age, sex, body mass index, socioeconomic status, and race. We also performed prospective analyses on a subset of this sample across 25 months that had Fitbit data available at Baseline and Follow Up (n = 143). Results: Cross-sectional analyses revealed that higher resting HR, lower step count and step count variability, and greater variability in sleep duration were associated with greater internalizing symptoms. Cross-lagged panel model analysis revealed that there were no prospective associations between wearable variables, but greater internalizing symptoms predicted lower step count 25 months later.Conclusions: Findings indicate that wearable indices concurrently, but not prospectively, associate with internalizing symptoms during early adolescence. Future research should capitalize on the temporal resolution provided by wearable devices to determine the intensive longitudinal relations between biobehavioral risk factors and acute changes in mental health.

show abstract

Section: Introductionmentioning

confidence: 96%

Concurrent and Prospective Associations Between Fitbit Derived RDoC Arousal and Regulation Constructs and Adolescent Internalizing Symptoms

Nelson¹,

Flannery²,

Duell³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…We also report that these models can use different features to achieve similar performance. Different models emphasize different features not simply because of its relevance to a disorder, but because of the mathematics associated with the model 34,35 . The variability of the ranking of features used by our individual models also illustrates the potential danger of using the single highest performing model, which is commonly seen in published literature.…”

Section: Discussionmentioning

confidence: 99%

Identifying bias in models that detect vocal fold paralysis from audio recordings using explainable machine learning and clinician ratings

Low

Randolph

Rao

et al. 2020

Preprint

View full text Add to dashboard Cite

ObjectivesTo detect unilateral vocal fold paralysis (UVFP) from voice recordings using an explainable model of machine learning.Study DesignCase series - retrospective with a control group.MethodsPatients with confirmed UVFP through endoscopic examination (N=77) and controls with normal voices matched for age and sex (N=77) were included. Two tasks were used to elicit voice samples: reading the Rainbow Passage and sustaining phonation of the vowel /a/. The eighty-eight extended Geneva Minimalistic Acoustic Parameter Set (eGeMAPS) features were extracted as inputs for four machine learning models of differing complexity. Training and testing were performed using bootstrapped cross-validation. SHAP was used to identify important features.ResultsThe median Area Under the Receiver Operating Characteristic Curve (ROC AUC) score ranged from 0.79 to 0.87 depending on model and task. After removing redundant features for explainability, the highest median ROC AUC score was 0.84 using only 13 features for the vowel task and 0.87 using 39 features for the reading task. The most important features included intensity measures, mean MFCC1, mean F1 amplitude and frequency, and shimmer variability depending on model and task.ConclusionUsing the largest dataset studying UVFP to date, we achieve high performance from just a few seconds of voice recordings while discovering which acoustic features are important across models. Notably, we demonstrate that the models use different combinations of features to achieve similar effect sizes. Overall the categories of features related to vocal fold physiology were conserved across the models. Machine learning thus provides a mechanism to detect UVFP and contextualize the accuracy relative to both model architecture and pathophysiology.Level of EvidenceType 3

show abstract

“…In fact, prominent critiques warn from overreliance on ML methods in suicide research (Siddaway et al, 2020). A recent critical commentary, for example, published in this journal (Jacobucci et al, 2021), challenged the prediction superiority of ML methods over more traditional methods, such as standard logistic regression, and mention a few examples in which both methods achieved similar results (e.g., van Mens et al, 2020). Moreover, the authors have evidenced artificially inflated prediction performances in some machine learning methods (specifically when researchers paired optimism-corrected bootstrap with random forests, instead of using internal validation methods, such as k-fold cross-validation).…”

Section: Challenges and Practical Recommendationsmentioning

confidence: 99%

The Hitchhiker’s Guide to Computational Linguistics in Suicide Prevention

Ophir

Tikochinski

Klomek

et al. 2021

Clinical Psychological Science

View full text Add to dashboard Cite

Suicide, a leading cause of death, is a complex and a hard-to-predict human tragedy. In this article, we introduce a comprehensive outlook on the emerging movement to integrate computational linguistics (CL) in suicide prevention research and practice. Focusing mainly on the state-of-the-art deep neural network models, in this “travel guide” article, we describe, in a relatively plain language, how CL methodologies could facilitate early detection of suicide risk. Major potential contributions of CL methodologies (e.g., word embeddings, interpretational frameworks) for deepening that theoretical understanding of suicide behaviors and promoting the personalized approach in psychological assessment are presented as well. We also discuss principal ethical and methodological obstacles in CL suicide prevention, such as the difficulty to maintain people’s privacy/safety or interpret the “black box” of prediction algorithms. Ethical guidelines and practical methodological recommendations addressing these obstacles are provided for future researchers and clinicians.

show abstract

Evidence of Inflated Prediction Performance: A Commentary on Machine Learning and Suicide Research

Cited by 48 publications

References 24 publications

Concurrent and Prospective Associations Between Fitbit Derived RDoC Arousal and Regulation Constructs and Adolescent Internalizing Symptoms

Concurrent and Prospective Associations Between Fitbit Derived RDoC Arousal and Regulation Constructs and Adolescent Internalizing Symptoms

Identifying bias in models that detect vocal fold paralysis from audio recordings using explainable machine learning and clinician ratings

The Hitchhiker’s Guide to Computational Linguistics in Suicide Prevention

Contact Info

Product

Resources

About