2020
DOI: 10.1093/bioinformatics/btaa046
|View full text |Cite
|
Sign up to set email alerts
|

Consensus features nested cross-validation

Abstract: Summary Feature selection can improve the accuracy of machine-learning models, but appropriate steps must be taken to avoid overfitting. Nested cross-validation (nCV) is a common approach that chooses the classification model and features to represent a given outer fold based on features that give the maximum inner-fold accuracy. Differential privacy is a related technique to avoid overfitting that uses a privacy-preserving noise mechanism to identify features that are stable between training… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
120
0
1

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
2
2

Relationship

0
8

Authors

Journals

citations
Cited by 133 publications
(121 citation statements)
references
References 25 publications
(32 reference statements)
0
120
0
1
Order By: Relevance
“…Independent variables were combinations of cortical and subcortical manifold features at baseline and their age-related trajectory data. We used elastic net regularization with nested ten-fold cross-validation (Cawley and Talbot, 2010; Parvandeh et al, 2020; Tenenbaum et al, 2000; Varma and Simon, 2006; Zou and Hastie, 2005) (see Methods ), and repeated the prediction 100 times with different training and test dataset compositions to mitigate subject selection bias. Across cross-validation and iterations, 6.24 ± 5.74 (mean ± SD) features were selected to predict IQ using manifold eccentricity of cortical regions at baseline, 6.20 ± 5.14 cortical features at baseline and maturational change, 5.45 ± 5.99 cortical and subcortical features at baseline, and 5.16 ± 5.43 at baseline and maturational change, suggesting that adding more independent variables may not per se lead to improvement in prediction accuracy.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…Independent variables were combinations of cortical and subcortical manifold features at baseline and their age-related trajectory data. We used elastic net regularization with nested ten-fold cross-validation (Cawley and Talbot, 2010; Parvandeh et al, 2020; Tenenbaum et al, 2000; Varma and Simon, 2006; Zou and Hastie, 2005) (see Methods ), and repeated the prediction 100 times with different training and test dataset compositions to mitigate subject selection bias. Across cross-validation and iterations, 6.24 ± 5.74 (mean ± SD) features were selected to predict IQ using manifold eccentricity of cortical regions at baseline, 6.20 ± 5.14 cortical features at baseline and maturational change, 5.45 ± 5.99 cortical and subcortical features at baseline, and 5.16 ± 5.43 at baseline and maturational change, suggesting that adding more independent variables may not per se lead to improvement in prediction accuracy.…”
Section: Resultsmentioning
confidence: 99%
“…Four different feature sets were evaluated: (1) manifold eccentricity of the identified cortical regions at baseline and (2) manifold eccentricity at baseline and its longitudinal change ( i.e., differences between follow-up and baseline), and (3) cortical manifold eccentricity and subcortical-weighted manifold of the identified regions at baseline and (4) manifold eccentricity and subcortical-weighted manifold at baseline and their longitudinal changes. For each evaluation, a subset of features that could predict future IQ was identified using elastic net regularization (ρ = 0.5) with optimized regularization parameters (L1 and L2 penalty terms) via nested ten-fold cross-validation (Cawley and Talbot, 2010; Parvandeh et al, 2020; Tenenbaum et al, 2000; Varma and Simon, 2006; Zou and Hastie, 2005). We split the dataset into training (9/10) and test (1/10) partitions, and each training partition was further split into inner training and testing folds using another ten-fold cross-validation.…”
Section: Methodsmentioning
confidence: 99%
“…Independent variables were combinations of cortical and subcortical manifold features at baseline and their age-related trajectory data. We used elastic net regularization with nested ten-fold cross-validation (Cawley and Talbot, 2010;Parvandeh et al, 2020;Tenenbaum et al, 2000;Varma and Simon, 2006;Zou and Hastie, 2005) (see Methods), and repeated the prediction 100 times with different training and test dataset compositions to mitigate subject selection bias. Across crossvalidation and iterations, 6.24 ± 5.74 (mean ± SD) features were selected to predict IQ using manifold eccentricity of cortical regions at baseline, 6.20 ± 5.14 cortical features at baseline and maturational change, 5.45 ± 5.99 cortical and subcortical features at baseline, and 5.16 ± 5.43 at baseline and maturational change, suggesting that adding more independent variables may not per se lead to improvement in prediction accuracy.…”
Section: Association Between Connectome Manifold and Cognitive Functionmentioning
confidence: 99%
“…For each evaluation, a subset of features that could predict future IQ was identified using elastic net regularization ( = 0.5) with optimized regularization parameters (L1 and L2 penalty terms) via nested ten-fold cross-validation (Cawley and Talbot, 2010;Parvandeh et al, 2020;Tenenbaum et al, 2000;Varma and Simon, 2006;Zou and Hastie, 2005 (Meng et al, 1992). In addition to predicting future IQ, we performed the same prediction analysis to predict the change of IQ between the baseline and follow-up.…”
Section: Association With the Development Of Cognitive Functionmentioning
confidence: 99%
“…We associated clinical variables of duration and onset of epilepsy with atypical asymmetry index and cortical atrophy using supervised machine learning. We utilized five-fold nested cross-validation (Tenenbaum et al, 2000;Varma and Simon, 2006;Cawley and Talbot, 2010;Parvandeh et al, 2020) with least absolute shrinkage and selection operator (LASSO) regression (Tibshirani, 1996). We split the dataset into training (4/5) and test (1/5) partitions, and each training partition was further split into inner training and testing folds using another five-fold cross-validation.…”
Section: Associations With Clinical Variablesmentioning
confidence: 99%