2017
DOI: 10.1186/s12874-017-0360-2
|View full text |Cite
|
Sign up to set email alerts
|

A nonparametric multiple imputation approach for missing categorical data

Abstract: BackgroundIncomplete categorical variables with more than two categories are common in public health data. However, most of the existing missing-data methods do not use the information from nonresponse (missingness) probabilities.MethodsWe propose a nearest-neighbour multiple imputation approach to impute a missing at random categorical outcome and to estimate the proportion of each category. The donor set for imputation is formed by measuring distances between each missing value with other non-missing values.… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
6

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(5 citation statements)
references
References 23 publications
0
5
0
Order By: Relevance
“…KNN and RF have been reported to have excellent imputation performance in relevant studies [5][6][7][8][9][10]27 , but these researches were based on real-data applications with continuous or mixed variables in limited application scenarios. In this study, KNN only had a relatively moderate imputation accuracy in scenarios with a value distribution of 7:3, but it had poor performance in most scenarios, which may be caused by the local structure of data in these simulation scenarios.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…KNN and RF have been reported to have excellent imputation performance in relevant studies [5][6][7][8][9][10]27 , but these researches were based on real-data applications with continuous or mixed variables in limited application scenarios. In this study, KNN only had a relatively moderate imputation accuracy in scenarios with a value distribution of 7:3, but it had poor performance in most scenarios, which may be caused by the local structure of data in these simulation scenarios.…”
Section: Discussionmentioning
confidence: 99%
“…Many researchers have successively used different datasets to compare the performance of traditional statistical and machine learning imputation methods, but the conclusions were different. Wei et al 5 , Waljee et al 6 , Shah et al 7 demonstrated respectively that RF outperforms other imputation methods in their datasets; Jerez et al 8 , Zhou et al 9 , Jadhav et al 10 found KNN outperforms other imputation methods in their datasets; Chlioui et al 11 found SVM performs best in two numeric datasets, while Tsai 12 found DT performs best in mixed datasets. Furthermore, both the ensemble learning (EL) algorithm proposed by Wang 13 and the generative adversarial imputation nets (GAIN) algorithm proposed by Dong 14 have been reported as possessing satisfactory imputation performance.…”
Section: Introductionmentioning
confidence: 99%
“… 11 Multiple imputation was applied for variables with less than 30% missing data. 12 , 13 Potential covariates included preoperative demographic variables routinely collected in the secure clinical repository. A total of 27 preoperative demographic variables were included age, body mass index, sex, race, smoking history, alcohol use history, hypertension, diabetes mellitus (type I or II), autoimmune conditions (systemic lupus erythematosus, etc.…”
Section: Methodsmentioning
confidence: 99%
“…Methods of imputations were as follows: predictive mean matching for continuous variables, logistic regression imputation for dichotomous variables, and polytomous regression imputation for categorical variables with more than 2 levels. [24][25][26] Multiple imputation processes were repeated for all variables (1 predictor, 10 outcomes, and 23 covariates) five times to generate five complete, imputed data sets using the "mice" function of the "mice" package 24 in R software. 27 Estimates were calculated by pooling the five sets of results from logistic regression models of five imputed data sets, using "pool" function of the "mice" package.…”
Section: Multiple Imputation Process and Analysesmentioning
confidence: 99%