2018
DOI: 10.1016/j.neuroimage.2017.06.061
|View full text |Cite
|
Sign up to set email alerts
|

Cross-validation failure: Small sample sizes lead to large error bars

Abstract: Predictive models ground many state-of-the-art developments in statistical brain image analysis: decoding, MVPA, searchlight, or extraction of biomarkers. The principled approach to establish their validity and usefulness is cross-validation, testing prediction on unseen data. Here, I would like to raise awareness on error bars of cross-validation, which are often underestimated. Simple experiments show that sample sizes of many neuroimaging studies inherently lead to large error bars, eg±10% for 100 samples. … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

19
449
1
2

Year Published

2018
2018
2023
2023

Publication Types

Select...
8
1

Relationship

0
9

Authors

Journals

citations
Cited by 531 publications
(471 citation statements)
references
References 59 publications
(76 reference statements)
19
449
1
2
Order By: Relevance
“…Our univariate analyses revealed no main effects of history of depression or symptom load for depression on any of the LICA components. The machine learning analyses here revealed overall low predictive value both for case-control status and symptoms of depression and anxiety, which is generally in line with the univariate analyses and an increasing body of literature suggesting small differences in brain structure between patients with MDD and healthy controls (Schmaal et al, 2016;Schmaal et al, 2017;Varoquaux, 2018;Wolfers, Buitelaar, Beckmann, Franke, & Marquand, 2015). While considering the overall low performance, the most important feature for classifying patients with a history of depression from healthy controls was a component encompassing covarying patterns of both high and low GMD in cerebellar regions (IC19).…”
Section: Discussionsupporting
confidence: 85%
“…Our univariate analyses revealed no main effects of history of depression or symptom load for depression on any of the LICA components. The machine learning analyses here revealed overall low predictive value both for case-control status and symptoms of depression and anxiety, which is generally in line with the univariate analyses and an increasing body of literature suggesting small differences in brain structure between patients with MDD and healthy controls (Schmaal et al, 2016;Schmaal et al, 2017;Varoquaux, 2018;Wolfers, Buitelaar, Beckmann, Franke, & Marquand, 2015). While considering the overall low performance, the most important feature for classifying patients with a history of depression from healthy controls was a component encompassing covarying patterns of both high and low GMD in cerebellar regions (IC19).…”
Section: Discussionsupporting
confidence: 85%
“…Further validations with larger, multicenter cohorts are necessary to contextualize and compare our findings. The trained models depend on both random and non-random class differences in the training sample and especially in light of our limited population sizes, we cannot reliably differentiate between real and random class differences in the trained models [62]. Consequently, we refrained from biological interpretation of the model's parameters, speculating on the exact measure order, and cautiously interpreted the features selected for classification.…”
Section: Discussionmentioning
confidence: 99%
“…Classifications were repeated to reduce variance in classification performance evaluations. Nested cross‐validations were used to furthermore ensure unbiased regression parameter optimization (Mendelson et al, ; Varma & Simon, ; Varoquaux, ). Thirdly, although of great interest, we refrained from biological interpretation of the model's parameters and weights.…”
Section: Discussionmentioning
confidence: 99%
“…Classifications were repeated to reduce variance in classification performance evaluations. Nested cross-validations were used to furthermore ensure unbiased regression parameter optimization (Mendelson et al, 2017;Varma & Simon, 2006;Varoquaux, 2018).…”
Section: Discussionmentioning
confidence: 99%