The (Limited?) Utility of Brain Age as a Biomarker for Capturing Fluid Cognition in Older Individuals

Tetereva, Alina; Pornpattananangkul, Narun

doi:10.1101/2022.12.31.522374

Cited by 2 publications

(2 citation statements)

References 83 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Understanding individual differences in brain-behavior relationships is a central goal of neuroscience. As part of this goal, machine learning approaches using neuroimaging data, such as functional connectivity, have grown increasingly popular in predicting numerous phenotypes 1 , including cognitive performance 2–6 , age 7–10 , and several clinically-relevant outcomes 11–13 . Compared to classic statistical inference, prediction offers advantages in replicability and generalizability, as it evaluates models on participants unseen during model training 14,15 .…”

Section: Introductionmentioning

confidence: 99%

“…as functional connectivity, have grown increasingly popular in predicting numerous phenotypes 1 , including cognitive performance [2][3][4][5][6] , age [7][8][9][10] , and several clinically-relevant outcomes [11][12][13] . Compared to classic statistical inference, prediction offers advantages in replicability and generalizability, as it evaluates models on participants unseen during model training 14,15 .…”

mentioning

confidence: 99%

See 1 more Smart Citation

The effects of data leakage on connectome-based machine learning models

Rosenblatt¹,

Tejavibulya²,

Jiang³

et al. 2023

Preprint

View full text Add to dashboard Cite

Predictive modeling has now become a central technique in neuroimaging to identify complex brain-behavior relationships and test their generalizability to unseen data. However, data leakage, which unintentionally breaches the separation between data used to train and test the model, undermines the validity of predictive models. Although previous literature suggests that leakage is generally pervasive in machine learning, few studies have empirically evaluated the effects of leakage in neuroimaging data. Here, using over 500 different pipelines spanning four large neuroimaging datasets and three phenotypes, we evaluated six forms of leakage fitting into three broad categories: feature selection, covariate correction, and lack of independence between subjects. As expected, leakage via feature selection and repeated subjects drastically inflated prediction performance. Notably, other forms of leakage had only minor effects (e.g., leaky site correction) or even decreased prediction performance (e.g., leaky covariate regression). In some cases, leakage affected not only prediction performance, but also model coefficients, and thus neurobiological interpretations. Overall, our results illustrate the variable effects of leakage on prediction pipelines and underscore the importance of avoiding data leakage to improve the validity and reproducibility of predictive modeling.

show abstract

Section: Introductionmentioning

confidence: 99%

mentioning

confidence: 99%

The effects of data leakage on connectome-based machine learning models

Rosenblatt¹,

Tejavibulya²,

Jiang³

et al. 2023

Preprint

View full text Add to dashboard Cite

show abstract

Data leakage inflates prediction performance in connectome-based machine learning models

Rosenblatt,

Tejavibulya,

Jiang

et al. 2024

Nat Commun

View full text Add to dashboard Cite

Predictive modeling is a central technique in neuroimaging to identify brain-behavior relationships and test their generalizability to unseen data. However, data leakage undermines the validity of predictive models by breaching the separation between training and test data. Leakage is always an incorrect practice but still pervasive in machine learning. Understanding its effects on neuroimaging predictive models can inform how leakage affects existing literature. Here, we investigate the effects of five forms of leakage–involving feature selection, covariate correction, and dependence between subjects–on functional and structural connectome-based machine learning models across four datasets and three phenotypes. Leakage via feature selection and repeated subjects drastically inflates prediction performance, whereas other forms of leakage have minor effects. Furthermore, small datasets exacerbate the effects of leakage. Overall, our results illustrate the variable effects of leakage and underscore the importance of avoiding data leakage to improve the validity and reproducibility of predictive modeling.

show abstract

The (Limited?) Utility of Brain Age as a Biomarker for Capturing Fluid Cognition in Older Individuals

Cited by 2 publications

References 83 publications

The effects of data leakage on connectome-based machine learning models

The effects of data leakage on connectome-based machine learning models

Data leakage inflates prediction performance in connectome-based machine learning models

Contact Info

Product

Resources

About