Missing values in a dataset is one of the most common difficulties in real applications. Many different techniques based on machine learning have been proposed in the literature to face this problem. In this work, the great representation capability of the stacked denoising auto-encoders is used to obtain a new method of imputating missing values based on two ideas: deletion and compensation. This method improves imputation performance by artificially deleting values in the input features and using them as targets in the training process. Nevertheless, although the deletion of samples is demonstrated to be really efficient, it may cause an imbalance between the distributions of the training and the test sets. In order to solve this issue, a compensation mechanism is proposed based on a slight modification of the error function to be optimized. Experiments over several datasets show that the deletion and compensation not only involve improvements in imputation but also in classification in comparison with other classical techniques.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.