2021
DOI: 10.1186/s13244-021-01115-1
|View full text |Cite
|
Sign up to set email alerts
|

Measuring the bias of incorrect application of feature selection when using cross-validation in radiomics

Abstract: Background Many studies in radiomics are using feature selection methods to identify the most predictive features. At the same time, they employ cross-validation to estimate the performance of the developed models. However, if the feature selection is performed before the cross-validation, data leakage can occur, and the results can be biased. To measure the extent of this bias, we collected ten publicly available radiomics datasets and conducted two experiments. First, the models were develope… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
29
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
8
1
1

Relationship

1
9

Authors

Journals

citations
Cited by 47 publications
(33 citation statements)
references
References 45 publications
0
29
0
Order By: Relevance
“…For MR imaging, the used weighting is reported in parenthesis; N denotes the number of samples; and in-plane resolution and slice thickness are reported as median and range Melanoma CT 97 0.7 (0.5-1.0) 1.2 (0.6-2.0) WORC [12] TCGA-GBM MR (T1) 53 0.8 (0.4-1.0) 5.0 (1.0-5.5) TCIA [16] while the other folds were used for training. Feature normalization, feature selection, and classifier training were processed only on the training fold [34]. The trained model was then applied to the test fold.…”
Section: Table 1 Datasets Used In the Experimentsmentioning
confidence: 99%
“…For MR imaging, the used weighting is reported in parenthesis; N denotes the number of samples; and in-plane resolution and slice thickness are reported as median and range Melanoma CT 97 0.7 (0.5-1.0) 1.2 (0.6-2.0) WORC [12] TCGA-GBM MR (T1) 53 0.8 (0.4-1.0) 5.0 (1.0-5.5) TCIA [16] while the other folds were used for training. Feature normalization, feature selection, and classifier training were processed only on the training fold [34]. The trained model was then applied to the test fold.…”
Section: Table 1 Datasets Used In the Experimentsmentioning
confidence: 99%
“…Supervised feature selection and modelling were performed in separate runs of cross validation, rather than within cross-validation splits. This procedural error is common in radiomic analyses and consequent data leakage results in a bias towards overly complex models [ 13 ]. Indeed, decreased external validation performance indicated overfitting.…”
Section: Survivalmentioning
confidence: 99%
“…This is challenging because different radiomics studies use different subsets of radiomics features to achieve optimal models. The variations in published feature selection approaches make radiomics models less clinically reproducible 23,24 . Therefore, to achieve a clinically-reliable radiomics model, it is important to study and account for the effect of the variation in feature selection (FS) methods [25][26][27] .…”
Section: Radiomics For Bm Detectionmentioning
confidence: 99%