2018
DOI: 10.1016/j.ijmedinf.2018.05.006
|View full text |Cite
|
Sign up to set email alerts
|

Comparison of variable selection methods for clinical predictive modeling

Abstract: The performance of classic regression-based and modern tree-based variable selection methods is associated with the size of the clinical dataset used. Classic regression-based variable selection methods seem to achieve better parsimony in clinical prediction problems in smaller datasets while modern tree-based methods perform better in larger datasets.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

2
158
0
1

Year Published

2019
2019
2023
2023

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 184 publications
(161 citation statements)
references
References 32 publications
(43 reference statements)
2
158
0
1
Order By: Relevance
“…In model 2, from the 7 traditional and 11 additional factors, relevant predictors were selected using backward selection, with the AIC stopping rule . This is an extensively used selection method in clinical prediction that seems to achieve better parsimony in smaller datasets compared to modern tree‐based methods . Variables that were selected in >33% of the 20 imputed datasets were kept in the model.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…In model 2, from the 7 traditional and 11 additional factors, relevant predictors were selected using backward selection, with the AIC stopping rule . This is an extensively used selection method in clinical prediction that seems to achieve better parsimony in smaller datasets compared to modern tree‐based methods . Variables that were selected in >33% of the 20 imputed datasets were kept in the model.…”
Section: Methodsmentioning
confidence: 99%
“…23 This is an extensively used selection method in clinical prediction 23,24 that seems to achieve better parsimony in smaller datasets compared to modern tree-based methods. 23 Variables that were selected in >33% of the 20 imputed datasets were kept in the model. Age was forced in the model, since previous studies agreed on age as a risk factor, and the age range in this population was limited.…”
Section: Statistical Analysesmentioning
confidence: 99%
“…Prior studies have assessed the performance of clinical predictive models, finding, as in our study, that machine learning methods performed equivalently to standard regression analyses . Although advanced analytic methods and traditional regression models have comparable discrimination, model performance is often influenced by both the size of the cohort under study and the number of events per variable (EPV) . Evidence suggests that logistic regression models perform better (in terms of accuracy, parsimony and/or discrimination) in smaller datasets with approximately 20‐50 EPV, while random forest models perform well with larger sample sizes and achieve sufficient stability when EPV exceeds 200 .…”
Section: Discussionmentioning
confidence: 60%
“…In addition, the number and type of variables selected as predictors affect model performance. Previous research has demonstrated that random forest models achieve high performance not only as more variables are selected, but also when a large number of continuous variables are used as predictors . Given these considerations, data composition, quality, and completeness should be of the utmost importance when selecting or merging clinical data sources for prediction modeling.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation