Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings

“…3. In the domain of software defect prediction, often the data sets under study represents less than 1 percent of the data point in total [1], [10] and [11]. Referencing [5] presented an example of using the most imbalanced of the NASA Metric Data Program; PC2.…”

Section: A Analysis Of Precision Measurementioning

confidence: 99%

A Deep Analysis of the Precision Formula for Imbalanced Class Distribution

Armah¹,

Luo²,

Qin³

2014

IJMLC

View full text Add to dashboard Cite

Abstract-In this paper we discussed our proposed formula of precision for an imbalanced class distribution which gives a true reflection of a defect predictor in relation to its high classifier performance. Our formula gave values which were closer to the Accuracy computation for both balanced and imbalanced class distribution, thus our formula gave consistent high values for a good predictor irrespective of the size of our target data.Approach: We used NASA dataset to come out with well-documented examples as to how to get a higher accuracy, with its corresponding higher precision and subsequently a higher Recall and F-Measure values which are reflection of the higher classifier performance. We used data with the minority class between 5 to 10 percent (5%-10%) data points inclusive. We applied a fixed true positive rate (TPR) of one (1), whiles the false positive rate (FPR) on the other hand ranged from 0.01 to 0.05 inclusive at an interval of 0.01 for our analysis .We used the proposed adjusted formula for precision computation to improve earlier works which were criticized of not being satisfactory. The proposed formula precision (AR) was used to compute the precisions which gave results that were the true reflection of a higher performance predictor. The results in the tables clearly show our assertion for our formula giving good estimated values for precision.Index Terms-True positive rate, false positive rate, precision, recall, F-measure and defect predictor classifier.

show abstract

“…In software defect prediction studies, it is also empirically shown that the performance of the Naïve Bayes is amongst the top algorithms [17]. As shown in Table 6, datasets are imbalanced.…”

Section: Construction Of the Prediction Modelmentioning

confidence: 99%

“…As a result, both increasing the efficiency of the software testing phase and delivering the software product to the market on time become possible. Reported results in software defect prediction literature suggest that further progress in defect prediction performance can be achieved by increasing the content of input data that defect predictors learn rather than using different algorithms or increasing the size of input data [17], [15], [16]. We can group some significant work in the literature in terms of their focus: algorithm driven approaches; data size driven approaches; and data content driven approaches.…”

mentioning

confidence: 99%

Influence of confirmation biases of developers on software quality: an empirical study

Çalıklı

Bener

2012

Software Qual J

View full text Add to dashboard Cite

The thought processes of people have a significant impact on software quality, as software is designed, developed and tested by people. Cognitive biases, which are defined as deviations of human mind from the laws of logic and mathematics, are likely to cause software defects. However, there is little empirical evidence to date to substantiate this assertion. In this research, we focus on a specific cognitive bias type called confirmation bias, which is defined as the tendency of people to seek for evidence to verify hypotheses rather than seeking for evidence to falsify them. Due to confirmation bias, developers might perform unit tests to make their program work rather than to break. Hence, confirmation bias is believed to be one of the factors that lead to increased software defect density. In this research, we present a metric scheme to explore the impact of developers' confirmation bias on software defect density. In order to estimate effectiveness of our metric scheme in quantification of confirmation bias within the context of software development, we performed an empirical study that addressed the prediction of the defective parts of software. In our empirical study, we used confirmation bias metrics on five datasets obtained from two companies. Our results provide empirical evidence that human thought processes and cognitive aspects deserve further investigation to improve decision making in software development for effective process management and resource allocation.

show abstract

Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings

Cited by 1,045 publications

References 61 publications

A Hierarchical Feature Set optimization for effective code change based Defect Forecasting

A Hierarchical Feature Set optimization for effective code change based Defect Forecasting

A Deep Analysis of the Precision Formula for Imbalanced Class Distribution

Influence of confirmation biases of developers on software quality: an empirical study

Contact Info

Product

Resources

About