Missing Value Imputation Based on Data Clustering

Zhang, Shichao; Zhang, Jilian; Zhu, Xiaofeng; Qin, Yongsong; Zhang, Chengqi

doi:10.1007/978-3-540-79299-4_7

Cited by 75 publications

(54 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…The most popular method in statistics is the regression imputation method. Common regression methods include the parametric methods (such as linear regression and the nonlinear imputation method) and the non-parametric methods (such as kernel imputation in [28]). The parametric regression imputations are superior if a dataset can be adequately modeled parametrically, or if users can correctly specify the parametric forms for the dataset.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Shell-neighbor method and its application in missing data imputation

Zhang

2010

Appl Intell

Self Cite

110

View full text Add to dashboard Cite

Data preparation is an important step in mining incomplete data. To deal with this problem, this paper introduces a new imputation approach called SN (Shell Neighbors) imputation, or simply SNI. The SNI fills in an incomplete instance (with missing values) in a given dataset by only using its left and right nearest neighbors with respect to each factor (attribute), referred them to Shell Neighbors. The left and right nearest neighbors are selected from a set of nearest neighbors of the incomplete instance. The size of the sets of the nearest neighbors is determined with the cross-validation method. And then the SNI is generalized to deal with missing data in datasets with mixed attributes, for example, continuous and categorical attributes. Some experiments are conducted for evaluating the proposed approach, and demonstrate that the generalized SNI method outperforms the kNN imputation method at imputation accuracy and classification accuracy.

show abstract

Section: Related Workmentioning

confidence: 99%

“…Another commonly used and efficient imputation is the k nearest neighbor imputation (called kNN imputation, or kNNI), which is one of the hot deck techniques used to compensate for missing data [3,28]. It uses only the k most relevant complete instances in the dataset for imputing a missing datum.…”

mentioning

confidence: 99%

Shell-neighbor method and its application in missing data imputation

Zhang

2010

Appl Intell

Self Cite

110

View full text Add to dashboard Cite

show abstract

“…A grade table is constructed where courses are columns and students are rows, courses are labeled from T1, students are labeled from S1.The missing value in the table should be processed. The techniques of missing value imputation are: list wise deletion, mean imputation and some types of hot-deck imputation [8][9][10]. The listwise deletion is used to deal with missing value in the paper.…”

Section: Advances In Computer Science Research (Acsr) Volume 73mentioning

confidence: 99%

Application in the Teaching of Principal Component Analysis

Ren¹,

Dai²

2017

Proceedings of the 7th International Conference on Education, Management, Information and Computer Science (ICEMC 2017)

View full text Add to dashboard Cite

Abstract. The classroom is the main ways for students to obtain knowledge, there are a lot of curriculum knowledge points and complicated, so it is difficult for students that are required to master all the knowledge. In order to improve the learning effect, the knowledge points should be distinguished to the primary and secondary of the knowledge points effectively, and then the important points of knowledge learning are strengthened to improve the quality of teaching. In this paper, the course of computer foundation as an example, the examination results of students are analyzed by principal component analysis method to know the main influencing factors of the courses, which lays a solid foundation to better carry out the teaching of this course.

show abstract

“…Parametric methods like Nearest Neighbour [4][10] [25] have been used for the prediction of missing attribute(s). Non-parametric technique such as empirical likelihood [32], clustering [26], Semi-parametric techniques [21] [33] have also been applied for missing data imputation. Techniques like mixture model clustering [9], machine learning [12] have been used for imputing missing data.…”

Section: Related Workmentioning

confidence: 99%

Imputation And Classification Of Missing Data Using Least Square Support Vector Machines – A New Approach In Dementia Diagnosis

Sivapriya¹,

Kamal²,

Thavavel³

2012

IJARAI

View full text Add to dashboard Cite

Abstract-This paper presents a comparison of different data imputation approaches used in filling missing data and proposes a combined approach to estimate accurately missing attribute values in a patient database. The present study suggests a more robust technique that is likely to supply a value closer to the one that is missing for effective classification and diagnosis. Initially data is clustered and z-score method is used to select possible values of an instance with missing attribute values. Then multiple imputation method using LSSVM (Least Squares Support Vector Machine) is applied to select the most appropriate values for the missing attributes. Five imputed datasets have been used to demonstrate the performance of the proposed method. Experimental results show that our method outperforms conventional methods of multiple imputation and mean substitution. Moreover, the proposed method CZLSSVM (Clustered Z-score Least Square Support Vector Machine) has been evaluated in two classification problems for incomplete data. The efficacy of the imputation methods have been evaluated using LSSVM classifier. Experimental results indicate that accuracy of the classification is increases with CZLSSVM in the case of missing attribute value estimation. It is found that CZLSSVM outperforms other data imputation approaches like decision tree, rough sets and artificial neural networks, K-NN (KNearest Neighbour) and SVM. Further it is observed that CZLSSVM yields 95 per cent accuracy and prediction capability than other methods included and tested in the study.

show abstract

Missing Value Imputation Based on Data Clustering

Cited by 75 publications

References 22 publications

Shell-neighbor method and its application in missing data imputation

Shell-neighbor method and its application in missing data imputation

Application in the Teaching of Principal Component Analysis

Imputation And Classification Of Missing Data Using Least Square Support Vector Machines – A New Approach In Dementia Diagnosis

Contact Info

Product

Resources

About