Measuring the bias of incorrect application of feature selection when using cross-validation in radiomics

Demircioğlu, Aydın

doi:10.1186/s13244-021-01115-1

Cited by 47 publications

(33 citation statements)

References 45 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For MR imaging, the used weighting is reported in parenthesis; N denotes the number of samples; and in-plane resolution and slice thickness are reported as median and range Melanoma CT 97 0.7 (0.5-1.0) 1.2 (0.6-2.0) WORC [12] TCGA-GBM MR (T1) 53 0.8 (0.4-1.0) 5.0 (1.0-5.5) TCIA [16] while the other folds were used for training. Feature normalization, feature selection, and classifier training were processed only on the training fold [34]. The trained model was then applied to the test fold.…”

Section: Table 1 Datasets Used In the Experimentsmentioning

confidence: 99%

Predictive performance of radiomic models based on features extracted from pretrained deep networks

Demircioğlu

2022

Insights Imaging

Self Cite

View full text Add to dashboard Cite

Objectives In radiomics, generic texture and morphological features are often used for modeling. Recently, features extracted from pretrained deep networks have been used as an alternative. However, extracting deep features involves several decisions, and it is unclear how these affect the resulting models. Therefore, in this study, we considered the influence of such choices on the predictive performance. Methods On ten publicly available radiomic datasets, models were trained using feature sets that differed in terms of the utilized network architecture, the layer of feature extraction, the used set of slices, the use of segmentation, and the aggregation method. The influence of these choices on the predictive performance was measured using a linear mixed model. In addition, models with generic features were trained and compared in terms of predictive performance and correlation. Results No single choice consistently led to the best-performing models. In the mixed model, the choice of architecture (AUC + 0.016; p < 0.001), the level of feature extraction (AUC + 0.016; p < 0.001), and using all slices (AUC + 0.023; p < 0.001) were highly significant; using the segmentation had a lower influence (AUC + 0.011; p = 0.023), while the aggregation method was insignificant (p = 0.774). Models based on deep features were not significantly better than those based on generic features (p > 0.05 on all datasets). Deep feature sets correlated moderately with each other (r = 0.4), in contrast to generic feature sets (r = 0.89). Conclusions Different choices have a significant effect on the predictive performance of the resulting models; however, for the highest performance, these choices should be optimized during cross-validation.

show abstract

Section: Table 1 Datasets Used In the Experimentsmentioning

confidence: 99%

Predictive performance of radiomic models based on features extracted from pretrained deep networks

Demircioğlu

2022

Insights Imaging

Self Cite

View full text Add to dashboard Cite

show abstract

“…Supervised feature selection and modelling were performed in separate runs of cross validation, rather than within cross-validation splits. This procedural error is common in radiomic analyses and consequent data leakage results in a bias towards overly complex models [ 13 ]. Indeed, decreased external validation performance indicated overfitting.…”

Section: Survivalmentioning

confidence: 99%

Radiomic assessment of oesophageal adenocarcinoma: a critical review of 18F-FDG PET/CT, PET/MRI and CT

et al. 2022

View full text Add to dashboard Cite

Objectives Radiomic models present an avenue to improve oesophageal adenocarcinoma assessment through quantitative medical image analysis. However, model selection is complicated by the abundance of available predictors and the uncertainty of their relevance and reproducibility. This analysis reviews recent research to facilitate precedent-based model selection for prospective validation studies. Methods This analysis reviews research on 18F-FDG PET/CT, PET/MRI and CT radiomics in oesophageal adenocarcinoma between 2016 and 2021. Model design, testing and reporting are evaluated according to the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) score and Radiomics Quality Score (RQS). Key results and limitations are analysed to identify opportunities for future research in the area. Results Radiomic models of stage and therapeutic response demonstrated discriminative capacity, though clinical applications require greater sensitivity. Although radiomic models predict survival within institutions, generalisability is limited. Few radiomic features have been recommended independently by multiple studies. Conclusions Future research must prioritise prospective validation of previously proposed models to further clinical translation.

show abstract

“…This is challenging because different radiomics studies use different subsets of radiomics features to achieve optimal models. The variations in published feature selection approaches make radiomics models less clinically reproducible 23,24 . Therefore, to achieve a clinically-reliable radiomics model, it is important to study and account for the effect of the variation in feature selection (FS) methods [25][26][27] .…”

Section: Radiomics For Bm Detectionmentioning

confidence: 99%

A radiomics-based machine learning pipeline to distinguish between metastatic and healthy bone using lesion-center-based geometric regions of interest

Naseri

Skamene

Tolba

et al. 2022

Preprint

View full text Add to dashboard Cite

Radiomics-based machine learning classifiers have shown potential for detecting bone metastases (BM) and for evaluating BM response to radiotherapy (RT). However, current radiomics pipelines require large datasets of images with expert-segmented 3D regions of interest (ROIs). Full ROI segmentation is time consuming and oncologists often outline just RT treatment fields in clinical practice. This presents a challenge for real-world radiomics research. As such, a method that simplifies BM identification but does not compromise the power of radiomics is needed. The objective of this study was to investigate the feasibility of a radiomics pipeline for BM detection using lesion-center-based geometric ROIs. The simulation-CT images of 170 patients with non-metastatic lung cancer and 189 patients with spinal BM were used. The point locations of 631 BM and 674 healthy bone (HB) regions were identified by experts. ROIs with various geometric shapes were centered and automatically delineated on the identified locations, and 107 radiomics features were extracted. Various re-sampling techniques, feature selection methods, and machine learning classifiers were evaluated. Our point-based radiomics pipeline was successful in differentiating BM from HB. This approach greatly simplifies the process of preparing images for use in radiomics studies and avoids the bottleneck of full ROI segmentation.

show abstract

Measuring the bias of incorrect application of feature selection when using cross-validation in radiomics

Cited by 47 publications

References 45 publications

Predictive performance of radiomic models based on features extracted from pretrained deep networks

Predictive performance of radiomic models based on features extracted from pretrained deep networks

Radiomic assessment of oesophageal adenocarcinoma: a critical review of 18F-FDG PET/CT, PET/MRI and CT

A radiomics-based machine learning pipeline to distinguish between metastatic and healthy bone using lesion-center-based geometric regions of interest

Contact Info

Product

Resources

About