Missing data imputation can help improve the performance of prediction models in situations where missing data hide useful information. This paper compares methods for imputing missing categorical data for supervised classification tasks. We experiment on two machine learning benchmark datasets with missing categorical data, comparing classifiers trained on non-imputed (i.e., one-hot encoded) or imputed data with different levels of additional missing-data perturbation. We show imputation methods can increase predictive accuracy in the presence of missing-data perturbation, which can actually improve prediction accuracy by regularizing the classifier. We achieve the state-of-the-art on the Adult dataset with missing-data perturbation and k-nearest-neighbors (k-NN) imputation.
Multiple imputation (MI) is the state-of-the-art approach for dealing with missing data arising from non-response in sample surveys. Multiple imputation by chained equations (MICE) is the most widely used MI method, but it lacks theoretical foundation and is computationally intensive. Recently, MI methods based on deep learning models have been developed with encouraging results in small studies. However, there has been limited research on systematically evaluating their performance in realistic settings comparing to MICE, particularly in large-scale surveys. This paper provides a general framework for using simulations based on real survey data and several performance metrics to compare MI methods. We conduct extensive simulation studies based on the American Community Survey data to compare repeated sampling properties of four machine learning based MI methods: MICE with classification trees, MICE with random forests, generative adversarial imputation network, and multiple imputation using denoising autoencoders. We find the deep learning based MI methods dominate MICE in terms of computational time; however, MICE with classification trees consistently outperforms the deep learning MI methods in terms of bias, mean squared error, and coverage under a range of realistic settings.
Objective: To test if there are significant evidence-based differences in effectiveness between self-ligation (SL) and conventional-ligation (CL) brackets.
Materials and Methods:Popular clinical claims of SL were identified through a literature overview of PubMed, EMBASE, Cochrane Library, and Web of Science for the period 1965-2017. Additional hand searching of the references from retrieved articles was completed. The articles containing the inclusion criteria were qualitatively analyzed using the Cochrane risk of bias tool, and one other scale. Applicable RCTs were statistically analyzed with weighted means calculations and forest plots. RCT data that could not be synthesized with one other RCT at this time were reserved for discussion.
Results:The inclusion criteria were satisfied by a total of ten RCT studies, six of which were matched for meta-analysis of three popular clinical claims. Space closure rate, reduced incisor proclination, and the rate of mandibular alignment for SL compared to CL were not statistically significant with confidence intervals of 95%. The remaining four RCTs were collectively analyzed and found no statistically significant difference in discomfort between SL and CL.
Conclusion:The null hypothesis that there are no differences between SL and CL, was not rejected due to statistically insignificant results. Additional active SL studies, and well-designed RCTs for MA are needed that includes overall treatment time. SL chair time efficiency was consistently higher versus CL.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.