Variable selection with multiply-imputed datasets: choosing between stacked and grouped methods

Du, Jiacong; Boss, Jonathan; Han, Peisong; Beesley, Lauren J.; Goutman, Stephen A.; Batterman, Stuart; Feldman, Eva L.; Mukherjee, Bhramar

doi:10.48550/arxiv.2003.07398

Cited by 2 publications

(4 citation statements)

References 33 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We will assess this flexible model in our sensitivity analysis. To avoid overfitting, we will employ the elastic net for a penalized estimation of regression coefficients [28,29]. The elastic net allows both selection and penalization of main effects by introducing two tuning parameters.…”

Section: Model Developmentmentioning

confidence: 99%

Psychotic relapse in people with schizophrenia within 12 months of discharge from acute inpatient care: protocol for development and validation of a prediction model based on a retrospective cohort study in three psychiatric hospitals in Japan

Satô

Watanabe²,

Maruo³

et al. 2022

Diagn Progn Res

View full text Add to dashboard Cite

Background Schizophrenia is a severe mental illness characterized by recurrent psychoses that typically waxes and wanes through its prodromal, acute, and chronic phases. A large amount of research on individual prognostic factors for relapse in people with schizophrenia has been published, and a few logistic models exist to predict psychotic prognosis for people in the prodromal phase or after the first episode of psychosis. However, research on prediction models for people with schizophrenia, including those in the chronic phase and after multiple recurrences, is scarce. We aim to develop and validate a prediction model for this population. Methods This is a retrospective cohort study to be undertaken in Japan. We will include participants aged 18 years or above, diagnosed with schizophrenia or related disorders, and discharged between January 2014 and December 2018 from one of the acute inpatient care wards of three geographically distinct psychiatric hospitals. We will collect pre-specified nine predictors at the time of recruitment, follow up the participants for 12 months after discharge, and observe whether our primary outcome of a relapse occurs. Relapse will be considered to have occurred in one of the following circumstances: (1) hospitalization; (2) psychiatrist’s judgment that the person needs hospitalization; (3) increasing doses of antipsychotics; or (4) suicidal or homicidal ideation or behavior resulting from such ideation. We will develop a Cox regression model and avoid overfitting by penalizing coefficients using the elastic net. The model will be validated both internally and externally by bootstrapping and “leave-one-hospital-out” cross-validation, respectively. We will evaluate the model’s performance in terms of discrimination and calibration. Decision curve analysis will be presented to aid decision-making. We will present a web application to visualize the model for ease of use in daily practice. Discussion This will be the first prediction modeling study of relapse after discharge among people with both first and multiple episodes of schizophrenia using routinely collected data. Trial registration This study was registered in the UMIN-CTR (UMIN000043345) on February 20, 2021.

show abstract

Section: Model Developmentmentioning

confidence: 99%

Psychotic relapse in people with schizophrenia within 12 months of discharge from acute inpatient care: protocol for development and validation of a prediction model based on a retrospective cohort study in three psychiatric hospitals in Japan

Satô

Watanabe²,

Maruo³

et al. 2022

Diagn Progn Res

View full text Add to dashboard Cite

show abstract

“…Although various models were proposed to solve the inconsistency of variable selection in MI settings, it was the first time that MI-LASSO [17] introduced the Group-LASSO [24] penalty into this problem. Different from stacking methods [10,11,15] which"stacks" multiply-imputed data as a single dataset and applies weighted models to select important variables, MI-LASSO treats the same variable across all imputed sets as a group of variables, and adopts Group-LASSO to jointly include or exclude the group of variables together. To make it clear, the mathematical loss function for Group-LASSO is:…”

Section: Mi-lassomentioning

confidence: 99%

“…Therefore information based criterion was used to select x % credible interval for shrinkage Bayesian MI-LASSO and evaluated performances. We took advantages of the modified version of Bayesian Information Criterion (BIC, referring to formula (10) and ( 11)) to assess different selection of credible interval. Also, this modified BIC was used to evaluate the five Bayesian MI-LASSO models on data.…”

Section: Performancesmentioning

confidence: 99%

“…However, this may have inadequate performance, because different sets of significant variables can be possibly selected for each imputed set, which leads to inconsistency of variable selection across all imputed datasets. In literature, various solutions [10][11][12][13][14][15][16][17] have been proposed to solve this problem. MI-LASSO, proposed by Chen and Wang [17] and acknowledged successful in application [18][19][20][21], provides a flexible method to select the same important variables jointly among all imputed datasets.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Variable Selection for Multiply-imputed Data: A Bayesian Framework

Zou¹,

Wang²,

Chen³

2022

Preprint

View full text Add to dashboard Cite

Multiple imputation is a widely used technique to handle missing data in large observational studies. For variable selection on multiply-imputed datasets, however, if we conduct selection on each imputed dataset separately, different sets of important variables may be obtained. MI-LASSO, one of the most popular solutions to this problem, regards the same variable across all separate imputed datasets as a group of variables and exploits Group-LASSO to yield a consistent variable selection across all the multiply-imputed datasets. In this paper, we extend the MI-LASSO model into Bayesian framework and utilize five different Bayesian MI-LASSO models to perform variable selection on multiplyimputed data. These five models consist of three shrinkage priors based and two discrete mixture prior based approaches. We conduct a simulation study investigating the practical characteristics of each model across various settings. We further demonstrate these methods via a case study using the multiply-imputed data from the University of Michigan Dioxin Exposure Study. The Python package BMIselect is hosted on Github under an Apache-2.0 license: https://github.com/zjg540066169/Bmiselect.

show abstract

Variable selection with multiply-imputed datasets: choosing between stacked and grouped methods

Cited by 2 publications

References 33 publications

Psychotic relapse in people with schizophrenia within 12 months of discharge from acute inpatient care: protocol for development and validation of a prediction model based on a retrospective cohort study in three psychiatric hospitals in Japan

Psychotic relapse in people with schizophrenia within 12 months of discharge from acute inpatient care: protocol for development and validation of a prediction model based on a retrospective cohort study in three psychiatric hospitals in Japan

Variable Selection for Multiply-imputed Data: A Bayesian Framework

Contact Info

Product

Resources

About