Eliminating the high false‐positive rate in defect prediction through <scp>BayesNet</scp> with adjustable weight

Zhao, Yanyang; Wang, Yawen; Zhang, Dalin; Gong, Yi

doi:10.1111/exsy.12977

Cited by 8 publications

(5 citation statements)

References 72 publications

(110 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Similar to classification techniques, these indicators also take into account the requirements of quantity and type, and these indicators have been widely used in the previous defect prediction research [86,87]. Nevertheless, more other types of indicators are not used in this paper [88][89][90], such as Gmeasure and FPR, etc. Recently effort-aware indicators have also been proposed to measure the performance of SDP models [91][92][93].…”

Section: Classification Techniquesmentioning

confidence: 99%

Evaluating the Impact of Data Transformation Techniques on the Performance and Interpretability of Software Defect Prediction Models

Zhao,

Huang,

Gong

et al. 2023

IET Software

View full text Add to dashboard Cite

The performance of software defect prediction (SDP) models determines the priority of test resource allocation. Researchers also use interpretability techniques to gain empirical knowledge about software quality from SDP models. However, SDP methods designed in the past research rarely consider the impact of data transformation methods, simple but commonly used preprocessing techniques, on the performance and interpretability of SDP models. Therefore, in this paper, we investigate the impact of three data transformation methods (Log, Minmax, and Z-score) on the performance and interpretability of SDP models. Through empirical research on (i) six classification techniques (random forest, decision tree, logistic regression, Naive Bayes, K-nearest neighbors, and multilayer perceptron), (ii) six performance evaluation indicators (Accuracy, Precision, Recall, F1, MCC, and AUC), (iii) two interpretable methods (permutation and SHAP), (iv) two feature importance measures (Top-k feature rank overlap and difference), and (v) three datasets (Promise, Relink, and AEEEM), our results show that the data transformation methods can significantly improve the performance of the SDP models and greatly affect the variation of the most important features. Specifically, the impact of data transformation methods on the performance and interpretability of SDP models depends on the classification techniques and evaluation indicators. We observe that log transformation improves NB model performance by 7%–61% on the other five indicators with a 5% drop in Precision. Minmax and Z-score transformation improves NB model performance by 2%–9% across all indicators. However, all three transformation methods lead to substantial changes in the Top-5 important feature ranks, with differences exceeding 2 in 40%–80% of cases (detailed results available in the main content). Based on our findings, we recommend that (1) considering the impact of data transformation methods on model performance and interpretability when designing SDP approaches as transformations can improve model accuracy, and potentially obscure important features, which lead to challenges in interpretation, (2) conducting comparative experiments with and without the transformations to validate the effectiveness of proposed methods which are designed to improve the prediction performance, and (3) tracking changes in the most important features before and after applying data transformation methods to ensure precise and traceable interpretability conclusions to gain insights. Our study reminds researchers and practitioners of the need for comprehensive considerations even when using other similar simple data processing methods.

show abstract

Section: Classification Techniquesmentioning

confidence: 99%

Evaluating the Impact of Data Transformation Techniques on the Performance and Interpretability of Software Defect Prediction Models

Zhao,

Huang,

Gong

et al. 2023

IET Software

View full text Add to dashboard Cite

show abstract

“…The higher rate of false positives that the CART can generate has a detrimental effect on the amount of time and resources required to investigate false positives, which are non-buggy classes mistakenly labeled as buggy (Yanyang et al, 2022). Its poor F1score of 0.9114, which is exceeded by the RF (0.9169), provides additional evidence that this is the case.…”

Section: Cart (Classification and Regression Trees) Resultsmentioning

confidence: 99%

Bug Prediction Models: seeking the most efficient

Marçal,

Garcia

2024

Preprint

View full text Add to dashboard Cite

Choosing the most appropriate machine learning model for bug prediction tasks is critical. This paper primarily compares the predictive power of individual models versus ensemble models. We begin by experimenting with popular single-machine learning models commonly used in bug prediction, like neural networks and support vector machines. Additionally, we test with ensemble models that combine individual models' unique strengths, aiming to maximize each's benefits. Our evaluation is based on datasets containing historical development data from well-known open-source software projects. We rely on various metrics when assessing the models, encompassing accuracy, precision, recall, and F1 score. Based on our research findings, it has been observed that ensemble models tend to outperform single models, particularly when it comes to maintaining resilience across various datasets.Nevertheless, factors like the project's complexity, data availability, and computational resources all play a role in determining whether to use single or ensemble models. This paper offers a thorough analysis of the factors to consider when selecting machine learning models and approaches for bug prediction, providing valuable insights into the field. Furthermore, it offers practical advice for professionals, enabling them to make informed choices.

show abstract

“…The samples of each category are randomly divided into 70% and 30%,

samples are used to search the optimal hyperparameters of the SVM classifier, and the remaining

samples are used to test the fault diagnosis accuracy of the PMDCMs. For each fault diagnosis, the accuracy of PMDCM’s model is checked by utilizing the combination of

and

; the 10-fold cross-validation [ 43 ] is utilized while the aim is to obtain stable fault diagnosis accuracy. The process of 10-fold cross-validation can be seen in Figure 13 .…”

Section: 3 the Results Based On Svm Classifiermentioning

confidence: 99%

A Novel Supervised Filter Feature Selection Method Based on Gaussian Probability Density for Fault Diagnosis of Permanent Magnet DC Motors

Wang

2022

Sensors

View full text Add to dashboard Cite

For permanent magnet DC motors (PMDCMs), the amplitude of the current signals gradually decreases after the motor starts. In this work, the time domain features and time-frequency-domain features extracted from several successive segments of current signals make up a feature vector, which is adopted for fault diagnosis of PMDCMs. Many redundant features will lead to a decrease in diagnosis efficiency and increase the computation cost, so it is necessary to eliminate redundant features and features that have negative effects. This paper presents a novel supervised filter feature selection method for reducing data dimension by employing the Gaussian probability density function (GPDF) and named Gaussian vote feature selection (GVFS). To evaluate the effectiveness of the proposed GVFS, we compared it with the other five filter feature selection methods by utilizing the PMDCM’s data. Additionally, Gaussian naive Bayes (GNB), k-nearest neighbor algorithm (k-NN), and support vector machine (SVM) are utilized for the construction of fault diagnosis models. Experimental results show that the proposed GVFS has a better diagnostic effect than the other five feature selection methods, and the average accuracy of fault diagnosis improves from 97.89% to 99.44%. This paper lays the foundation of fault diagnosis for PMDCMs and provides a novel filter feature selection method.

show abstract

Eliminating the high false‐positive rate in defect prediction through BayesNet with adjustable weight

Cited by 8 publications

References 72 publications

Evaluating the Impact of Data Transformation Techniques on the Performance and Interpretability of Software Defect Prediction Models

Evaluating the Impact of Data Transformation Techniques on the Performance and Interpretability of Software Defect Prediction Models

Bug Prediction Models: seeking the most efficient

A Novel Supervised Filter Feature Selection Method Based on Gaussian Probability Density for Fault Diagnosis of Permanent Magnet DC Motors

Contact Info

Product

Resources

About