2018
DOI: 10.1002/bimj.201700067
|View full text |Cite
|
Sign up to set email alerts
|

Variable selection – A review and recommendations for the practicing statistician

Abstract: Statistical models support medical research by facilitating individualized outcome prognostication conditional on independent variables or by estimating effects of risk factors adjusted for covariates. Theory of statistical models is well‐established if the set of independent variables to consider is fixed and small. Hence, we can assume that effect estimates are unbiased and the usual methods for confidence interval estimation are valid. In routine work, however, it is not known a priori which covariates shou… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

3
744
1
2

Year Published

2018
2018
2023
2023

Publication Types

Select...
10

Relationship

0
10

Authors

Journals

citations
Cited by 935 publications
(750 citation statements)
references
References 66 publications
3
744
1
2
Order By: Relevance
“…Multiple imputation using chained Equation was performed using the R package mice for risk group and the three test results needed for determining CP score (see above); models fitted to five imputation sets were combined. Due to a low event‐to‐covariate ratio for several endpoints, to reduce bias in the fitted regression coefficients variable selection for the multifactorial models was conducted using the change‐in‐estimate approach . Specifically, augmented backward elimination using the R package abe was performed for all imputation sets (with age, sex, Child‐Pugh category and treatment status as compulsory inclusions), and covariates selected that were retained in ≥50% of 100 bootstrap re‐samples.…”
Section: Methodsmentioning
confidence: 99%
“…Multiple imputation using chained Equation was performed using the R package mice for risk group and the three test results needed for determining CP score (see above); models fitted to five imputation sets were combined. Due to a low event‐to‐covariate ratio for several endpoints, to reduce bias in the fitted regression coefficients variable selection for the multifactorial models was conducted using the change‐in‐estimate approach . Specifically, augmented backward elimination using the R package abe was performed for all imputation sets (with age, sex, Child‐Pugh category and treatment status as compulsory inclusions), and covariates selected that were retained in ≥50% of 100 bootstrap re‐samples.…”
Section: Methodsmentioning
confidence: 99%
“…Backward elimination uses a full generalized linear regression model to eliminate candidate variables (one variable per cycle), with the smallest contribution to the outcome variable until stopping criteria have been met and the final model with significant candidate variables remain (remaining variable significance P < .10). Backward elimination is a recommended method of model selection due to assumptions of no bias when starting with the full generalized linear model . Analyses for this study were conducted using SAS statistical software (SAS Institute Inc., Cary, NC).…”
Section: Methodsmentioning
confidence: 99%
“…Statistical significance was set at α =0.05. The model reduction method deviated from the recommended guidelines from Heinze et al in two ways: (1) variable selection was performed with an events‐per‐variable ratio of less than 10; this was due to the fact that our initial model included only eight explanatory variables with an events‐per‐variable≈18, while an additional 10 explanatory variables were added to satisfy reviewer concerns but were otherwise not felt to be strongly predictive of ΔKF, and (2) shrinkage estimations were not performed on any of the reduced model coefficients; instead, bootstrap mean values for the reduced model coefficients were calculated using 50 bootstrap samples as we felt that this would provide a reasonable estimate of model robustness relative to the study sample. All statistics were performed using the Matlab Statistics Toolbox (Mathworks, Natick, MA, USA).…”
Section: Methodsmentioning
confidence: 99%