An empirical study toward dealing with noise and class imbalance issues in software defect prediction

Pandey, Sushant Kumar; Tripathi, Anil Kumar

doi:10.1007/s00500-021-06096-3

Cited by 25 publications

(6 citation statements)

References 85 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Pandey and Tripathi [16] performed an empirical study focused on dealing with noise and class imbalance issues in software defect prediction. They show that if a dataset contains 10% -40% of incorrectly labeled instances the true positive rate (TPR) and true negative rate (TNR) are reduced by 20% -30% and receiver operating characteristic (ROC) values are reduced by 40% -50%.…”

Section: Noise In Sdp Datasetsmentioning

confidence: 99%

See 1 more Smart Citation

Empirical Study: How Issue Classification Influences Software Defect Prediction

et al. 2023

View full text Add to dashboard Cite

Software defect prediction aims to identify potentially defective software modules to better allocate limited quality assurance resources. Practitioners often do this by utilizing supervised models trained using historical data. This data is gathered by mining version control and issue tracking systems. Version control commits are linked to issues they address. If the linked issue is classified as a bug report, the change is considered as bug fixing. The problem arises from the fact that issues are often incorrectly classified within issue tracking systems. This introduces noise into the gathered datasets. In this paper, we investigate the influence issue classification has on software defect prediction dataset quality and resulting model performance. To do this, we mine data from 7 popular open-source repositories, create issue classification and software defect prediction datasets for each of them. We investigate issue classification using four different methods; a simple keyword heuristic, an improved keyword heuristic, the FastText model and the RoBERTa model. Our results show that using the RoBERTa model for issue classification produces the best software defect prediction datasets, containing on average 14.3641% of mislabeled instances. SDP models trained on such datasets achieve superior performance, to those trained on SDP datasets created using other issue classification methods, in 65 out of 84 experiments, with 55 of them being statistically relevant. Furthermore, in 17 out of 28 experiments we could not show a statistically relevant performance difference between SDP models trained on RoBERTa derived software defect prediction datasets and those created using manually labeled issues.

show abstract

Section: Noise In Sdp Datasetsmentioning

confidence: 99%

“…Kim et al [10], Seiffert et al [15], Pandey and Tripathi [16], Tantithamthavorn et al [17] have all pointed out that noise in the resulting dataset can lead to severe degradation of model performance. While Khan et al [18] showed that noise filters struggle to mitigate the problem, once noise is present in the dataset.…”

Section: Introductionmentioning

confidence: 99%

Empirical Study: How Issue Classification Influences Software Defect Prediction

et al. 2023

View full text Add to dashboard Cite

show abstract

“…Machine Learning prediction model was utilized for medical datasets for acute organ failure in critical patients (12) . Traditional classification model believes that the misclassification and the SDP model was presented for noise and class imbalance defects to remove the noisy datas which have drawbacks on high error rate in transformation of unbalanced dataset (13) . Classifier ensemble method for high dimensional data classification was portrayed to overcome the baseline models which have minimal drawbacks on predicting the imbalanced datas in an optimized manner (14) .…”

Section: Introductionmentioning

confidence: 99%

Impact of Unbalanced Classification on the Performance of Software Defect Prediction Models

Eldho¹

2022

IJST

View full text Add to dashboard Cite

Objectives:To propose a suitable imbalanced data classification model to split the dataset into two new datasets and to test the created imbalanced dataset by the prediction models. Methods: The imbalance defect data sets are taken from the PROMISE library and used for the performance evaluation. The results clearly demonstrate that the performance of three existing prediction classifier models, K-Nearest Neighbor (KNN), Naive Bayes (NB), and Back Propagation (BPN), is very susceptible in terms of unbalance of classification, while Support Vector Machine (SVM) and Extreme Learning Machine (ELM) are more stable. Findings: The outcome of this research reveals that applied SVM and ELM machine learning models improves the performance in defect prediction and records 29% more than KNN, and 19% more than NB and BPN. Novelty: According to the findings of a comprehensive study, the proposed machine learning new classification imbalance impact analysis method outperforms the existing ones in order to transform the original imbalance data set into a new data set with an increasing imbalance rate and be able to select models to evaluate different predictions on the new data set.

show abstract

“…Extensive experiments on four widely used datasets indicate that ISDA-based solution performs better than eight state-of-the-art methods, covering support vector machine (SVM), Random Forest, random oversampling (ROS), RUS, TSCS (Liu et al, 2014), CDDL (Jing et al, 2014), CEL (Sun et al, 2012), subclass discriminant analysis (SDA) and AdaBoost.NC (S. Wang & Yao, 2013). Pandey and Tripathi (2021) dealt with the impact of noise and class imbalance problem on five defect models by adding the various noise level (080%), which provides guidelines for the possible range of tolerable noise for baseline models. The experimental results on 864 experiments over three public datasets show that Random forest outperforms compared with other state-of-the-art techniques under AUC, which has a high noise tolerance rate (3040%).…”

mentioning

confidence: 99%

“…With regard to evaluation metrics, these studies employed metrics that were discrepant, even ignored the false-positive indicators, although these studies are very meaningful to the software engineering community. Galar et al (2012), Wang et al (2016), and Pandey and Tripathi (2021) hired only one measure, which did not provide a well-established reference for participants. Khoshgoftaar et al (2014) and Diez-Pastor et al (2015) employed comprehensive performance indicators, which makes people confused about basic indicators (e.g., recall and FPR).…”

mentioning

confidence: 99%

Eliminating the high false‐positive rate in defect prediction through BayesNet with adjustable weight

et al. 2022

View full text Add to dashboard Cite

In defect prediction, a high false‐positive rate (FPR) caused by class imbalance not only increases the workload of testing and development but also consumes unnecessary costs. Many defect models against class imbalance have been proposed to improve the accuracy of defect prediction, but their ability to reduce FPR is unclear. To solve these problems, we first proposed a BayesNet with adjustable weights, called WBN, to reduce the FPR in software defect prediction, which is an algorithm independent of data preprocessing techniques. The mechanism of our WBN is to change the sampling probability of the misclassified instances when training the defect model, making the BayesNet model focus more on false alarm instances. And then, we investigate the FPR of five mainstream defect models for solving class imbalance and select them as comparison models to test the validity of our methods. The experimental result on eight open‐source projects shows that a) our WBN, in in‐version defect prediction (IVDP) and cross‐version defect prediction (CVDP), effectively reduces FPR with means of 0.384 and 0.322, respectively; b) compared with improved subclass discriminant analysis (ISDA) that is the lowest FPR in all control models, our WBN not only reduced the FPR but maintained recall whose mean value was 0.797, whereas ISDA did not, with an average recall of only 0.397; c) our WBN, in CVDP, not only reduces FPR, but also has significant superiority over five control defect models and baseline. Besides, we also found that the class imbalance difference between the test set and the training set has an impact on CVDP performance, recommending that practitioners choose the best dataset for CVDP from the defect data of the historical version through special technology.

show abstract

An empirical study toward dealing with noise and class imbalance issues in software defect prediction

Abstract: Noname manuscript No.

Cited by 25 publications

References 85 publications

Empirical Study: How Issue Classification Influences Software Defect Prediction

Empirical Study: How Issue Classification Influences Software Defect Prediction

Impact of Unbalanced Classification on the Performance of Software Defect Prediction Models

Eliminating the high false‐positive rate in defect prediction through BayesNet with adjustable weight

Contact Info

Product

Resources

About