Multiple imputation in the presence of high-dimensional data

Zhao, Yize; Long, Qi

doi:10.1177/0962280213511027

Cited by 72 publications

(64 citation statements)

References 41 publications

(50 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Different penalty specifications give rise to various regularized regression methods. Zhao and Long (2013) 20 investigated the use of regularized regression for MI including lasso 21 , elastic net 22 (EN), and adaptive lasso 23 (Alasso). They also developed MI using a Bayesian lasso approach.…”

mentioning

confidence: 99%

Multiple Imputation for General Missing Data Patterns in the Presence of High-dimensional Data

Deng

Chang

Ido

et al. 2016

Sci Rep

Self Cite

View full text Add to dashboard Cite

Multiple imputation (MI) has been widely used for handling missing data in biomedical research. In the presence of high-dimensional data, regularized regression has been used as a natural strategy for building imputation models, but limited research has been conducted for handling general missing data patterns where multiple variables have missing values. Using the idea of multiple imputation by chained equations (MICE), we investigate two approaches of using regularized regression to impute missing values of high-dimensional data that can handle general missing data patterns. We compare our MICE methods with several existing imputation methods in simulation studies. Our simulation results demonstrate the superiority of the proposed MICE approach based on an indirect use of regularized regression in terms of bias. We further illustrate the proposed methods using two data examples.

show abstract

mentioning

confidence: 99%

Multiple Imputation for General Missing Data Patterns in the Presence of High-dimensional Data

Deng

Chang

Ido

et al. 2016

Sci Rep

Self Cite

View full text Add to dashboard Cite

show abstract

“…Some recent work by Zhao and Long 49 and Deng et al 50 has investigated imputation methods in the presence of high-dimensional data, but methods in this area are largely under-developed and additional research is urgently needed.…”

Section: Discussionmentioning

confidence: 99%

Variable selection in the presence of missing data: imputation‐based methods

Zhao

Long

2017

WIREs Computational Stats

Self Cite

View full text Add to dashboard Cite

Variable selection plays an essential role in regression analysis as it identifies important variables that associated with outcomes and is known to improve predictive accuracy of resulting models. Variable selection methods have been widely investigated for fully observed data. However, in the presence of missing data, methods for variable selection need to be carefully designed to account for missing data mechanisms and statistical techniques used for handling missing data. Since imputation is arguably the most popular method for handling missing data due to its ease of use, statistical methods for variable selection that are combined with imputation are of particular interest. These methods, valid used under the assumptions of missing at random (MAR) and missing completely at random (MCAR), largely fall into three general strategies. The first strategy applies existing variable selection methods to each imputed dataset and then combine variable selection results across all imputed datasets. The second strategy applies existing variable selection methods to stacked imputed datasets. The third variable selection strategy combines resampling techniques such as bootstrap with imputation. Despite recent advances, this area remains under-developed and offers fertile ground for further research.

show abstract

“…Serum 25(OH)D level (ng/ml), ethnicity and gender were the only variables in the dataset that did not have any missing information. It has now been recognised that complete case analysis without adequate handling of missing data may lead to biased results, or reduced power and precision of estimates (Zhao and Long, 2016). We assumed that the missing variables were missing at random (MAR).…”

Section: Multiple Imputationmentioning

confidence: 99%

“…We imputed the missing values using multiple imputation by chained equations (MICE) (Deng et al, 2016). MICE has been shown to be a robust method for dealing with missing data across empirical and longitudinal studies (He et al, 2011;Zhao and Long, 2016). In the MICE procedure a series of regression models are run whereby each variable with missing data is modelled according to its distribution (Azur et al, 2011); for continuous variables, this would be a multivariable linear regression; and for binary variables, a logistic regression.…”

Section: Multiple Imputationmentioning

confidence: 99%

Vitamin D and clinical symptoms in First Episode Psychosis (FEP): A prospective cohort study

Lally

Ajnakina

Singh

et al. 2019

Schizophrenia Research

View full text Add to dashboard Cite

We identified a prospective association between higher baseline serum Vitamin D levels and lower total psychotic symptoms and negative symptoms of psychosis at 12 months after first contact for psychosis. The results of this study require replication in larger prospective studies, and highlight the need for large randomised trials to assess the effect of vitamin D supplementation on symptoms of psychosis in FEP.

show abstract

Multiple imputation in the presence of high-dimensional data

Cited by 72 publications

References 41 publications

Multiple Imputation for General Missing Data Patterns in the Presence of High-dimensional Data

Multiple Imputation for General Missing Data Patterns in the Presence of High-dimensional Data

Variable selection in the presence of missing data: imputation‐based methods

Vitamin D and clinical symptoms in First Episode Psychosis (FEP): A prospective cohort study

Contact Info

Product

Resources

About