Evaluating defect prediction approaches: a benchmark and an extensive comparison

D’Ambros, Marco; Lanza, Michele; Robbes, Romain

doi:10.1007/s10664-011-9173-9

Cited by 487 publications

(344 citation statements)

References 68 publications

Supporting

Mentioning

335

Contrasting

Order By: Relevance

“…Surprisingly, the results about precision in Table III are, on average, higher than those in Table II, implying that CPDP achieves better performance than WPDP in this example. This finding is different from the results of many prior studies [3,9]. Similarly, Figure 3 shows that in the scenario of CPDP DTR is also the best estimator when considering precision.…”

Section: A Answer To Rq1contrasting

confidence: 85%

“…Interestingly, several recent studies with respect to software defect classification [3,9,10] have found that simple classifiers, e.g., Naïve Bayes and Logistic Regression, were able to perform well in both within-project and cross-project scenarios, though those complex ones always achieved high precision. As we know, newly created or unpopular software projects have little historical data available to train any classifiers, which is very similar to the typical problem cold start in recommender systems [11].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

An empirical study on predicting defect numbers

Chen

2015

International Conferences on Software Engineering and Knowledge Engineering

View full text Add to dashboard Cite

Abstract-Defect prediction is an important activity to make software testing processes more targeted and efficient. Many methods have been proposed to predict the defect-proneness of software components using supervised classification techniques in within-and cross-project scenarios. However, very few prior studies address the above issue from the perspective of predictive analytics. How to make an appropriate decision among different prediction approaches in a given scenario remains unclear. In this paper, we empirically investigate the feasibility of defect numbers prediction with typical regression models in different scenarios. The experiments on six open-source software projects in PROMISE repository show that the prediction model built with Decision Tree Regression seems to be the best estimator in both of the scenarios, and that for all the prediction models, the results yielded in the cross-project scenario can be comparable to (or sometimes better than) those in the within-project scenario when choosing suitable training data. Therefore, the findings provide a useful insight into defect numbers prediction for those new and inactive projects.

show abstract

Section: A Answer To Rq1contrasting

confidence: 85%

Section: Introductionmentioning

confidence: 99%

An empirical study on predicting defect numbers

Chen

2015

International Conferences on Software Engineering and Knowledge Engineering

View full text Add to dashboard Cite

show abstract

“…The goal of our research is to empirically study performance of various classifiers for fault prediction on the data sets provided by Marco D'Ambros [45].…”

Section: Concluding Remarks and Future Workmentioning

confidence: 99%

Empirical Evaluation of Machine Learning Algorithms for Fault Prediction

Kaur¹,

Kaur²

2014

LNSE

View full text Add to dashboard Cite

Abstract-Producing quality software is a very challenging task looking at the size and complexity of software developed these days. Predicting software quality early helps in using testing resources optimally. So, many statistical and machine learning techniques are used to predict quality classes in software. In this work, six machine learning classifiers have been used to estimate the fault proneness of 5885 classes used in five open source software on the basis of object-oriented metrics calculated on these classes. Bagging and J48 classifiers turn out to be the best one amongst the classifiers used.

show abstract

“…http://sourcerer.ics.uci.edu/ • Ultimate Debian Database (UDD) [11] http://udd.debian.org/ • Bug Prediction Dataset (BPD) [12], [13]:…”

Section: Research Datasetsmentioning

confidence: 99%