2017
DOI: 10.1016/j.jss.2017.07.012
|View full text |Cite
|
Sign up to set email alerts
|

Cross-validation based K nearest neighbor imputation for software quality datasets: An empirical study

Abstract: Being able to predict software quality is essential, but also it pose significant challenges in software engineering. Historical software project datasets are often being utilized together with various machine learning algorithms for fault-proneness classification.Unfortunately, the missing values in datasets have negative impacts on the estimation accuracy and therefore, could lead to inconsistent results. As a method handling missing data, K nearest neighbor (KNN) imputation gradually gains acceptance in emp… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
39
0

Year Published

2017
2017
2022
2022

Publication Types

Select...
4
4

Relationship

0
8

Authors

Journals

citations
Cited by 74 publications
(40 citation statements)
references
References 65 publications
(95 reference statements)
0
39
0
Order By: Relevance
“…Missing Data Ignoring can be recommended in the case of MCAR found in a dataset or with a low level of missing data [17,24] b) Missing Data Toleration: The strategy of this technique is based on the internal treatment where missing data in the dataset is tolerated and analysis is directly performed on the dataset. One such kind of toleration approach is to assign a NULL value to replace the missing piece of data [17,18,26]. c) Missing Data Imputation: There are various strategies employed for missing data imputation, in which the missing values found in the dataset are filled, which lets the complete dataset being analyzed.…”
Section: ) Mechanisms Of Missingmentioning
confidence: 99%
See 1 more Smart Citation
“…Missing Data Ignoring can be recommended in the case of MCAR found in a dataset or with a low level of missing data [17,24] b) Missing Data Toleration: The strategy of this technique is based on the internal treatment where missing data in the dataset is tolerated and analysis is directly performed on the dataset. One such kind of toleration approach is to assign a NULL value to replace the missing piece of data [17,18,26]. c) Missing Data Imputation: There are various strategies employed for missing data imputation, in which the missing values found in the dataset are filled, which lets the complete dataset being analyzed.…”
Section: ) Mechanisms Of Missingmentioning
confidence: 99%
“…Idri, et al [18] conducted a study to evaluate the impact of different missing data techniques on ABE using KNN. Huang, et al [17] performed an empirical study on crossvalidation of KNN imputation for software quality dataset, though the study compared KNN imputation and Mean imputation, it was specifically on software quality dataset, they did not focus on estimation or ABE. The related studies indicate the importance of imputing the missing data in past projects, especially for ABE.…”
Section: Related Workmentioning
confidence: 99%
“…In this study, the optimal choice of K was determined by 10-fold cross-validation [37]. Optimal K based on research from [35], [38] was used in this study.…”
Section: The Multiple Face Recognition Algorithmmentioning
confidence: 99%
“…The second solution is based on missing value imputation. It can provide estimations for missing values by reasoning from the observed data (i.e., complete data) [13, 14, 20]. …”
Section: Introductionmentioning
confidence: 99%
“…The experimental results have shown that missing value imputation is a better choice than case deletion when the incomplete datasets contain a certain amount of missing values. Model-based missing value imputation algorithms based on machine learning techniques, such as k -nearest neighbor, multilayer perceptron neural networks, and support vector machines, have recently lately been widely considered [14, 16, 21]. …”
Section: Introductionmentioning
confidence: 99%