A Systematic Literature Review on Fault Prediction Performance in Software Engineering

Hall, Tracy; Beecham, Sarah; Bowes, David; Gray, David; Counsell, Steve

doi:10.1109/tse.2011.103

Cited by 901 publications

(671 citation statements)

References 212 publications

(96 reference statements)

Supporting

Mentioning

629

Contrasting

Unclassified

Order By: Relevance

“…Since prediction results are categorical (faulty or not-faulty), we decided to test classifiers often used in software defect prediction [7,14,25,38], which are available in the basic package of KNIME:…”

Section: Prediction Modelsmentioning

confidence: 99%

See 1 more Smart Citation

Cost Effectiveness of Software Defect Prediction in an Industrial Project

Hryszko

Madeyski

2018

Foundations of Computing and Decision Sciences

View full text Add to dashboard Cite

Abstract. Software defect prediction is a promising approach aiming to increase software quality and, as a result, development pace. Unfortunately, the cost effectiveness of software defect prediction in industrial settings is not eagerly shared by the pioneering companies. In particular, this is the first attempt to investigate the cost effectiveness of using the DePress open source software measurement framework (jointly developed by Wroclaw University of Science and Technology, and Capgemini software development company) for defect prediction in commercial software projects. We explore whether defect prediction can positively impact an industrial software development project by generating profits. To meet this goal, we conducted a defect prediction and simulated potential quality assurance costs based on the best possible prediction results when using a default, non-tweaked DePress configuration, as well as the proposed Quality Assurance (QA) strategy. Results of our investigation are optimistic: we estimated that quality assurance costs can be reduced by almost 30% when the proposed approach will be used, while estimated DePress usage Return on Investment (ROI) is fully 73 (7300%), and Benefits Cost Ratio (BCR) is 74. Such promising results, being the outcome of the presented research, have caused the acceptance of continued usage of the DePress-based software defect prediction for actual industrial projects run by Volvo Group.

show abstract

Section: Prediction Modelsmentioning

confidence: 99%

“…Prediction results -modules marked as defect-prone or non-defect prone, can be compared against actual data describing defect-prone module distribution and were used to build the confusion matrix (Table 3) -a commonly used tool for performance comparison across categorical studies [7]. …”

Section: Prediction Modelsmentioning

confidence: 99%

Cost Effectiveness of Software Defect Prediction in an Industrial Project

Hryszko

Madeyski

2018

Foundations of Computing and Decision Sciences

View full text Add to dashboard Cite

show abstract

“…A replication package for our study is publicly available for download 7 . In the replication package, we provide: (i) the scripts for the extraction process on a specific dataset, (ii) the datasets used in our experimentation, and (iii) the raw data for the experimented predictors.…”

Section: Resultsmentioning

confidence: 99%

“…Table I), we see that in all cases where the traditional approach scores similarly to the GA one, there is a high portion of changed classes. For example, traditional RT performs well for predicting changes in GUAVA's releases R.13, R.14 and R. 15 where around 50% of all classes were changed; in contrast, it 7 http://www.ifi.uzh.ch/seal/people/alexandru/downloads/smart-learning-rp. html performs poorly for predicting changes in release R.17, where only 15% of classes were changed.…”

Section: Resultsmentioning

confidence: 99%

“…Researchers devised a number of defect and change prediction approaches to guide software maintenance activities by identifying the software artifacts that are more prone to being changed or to defects in the future [3]. All these approaches are based on statistical models, whose main difference is the diverse sets of predicting metrics and the underlying algorithms that learn from these metrics and make predictions [7], [8]. Examples of metrics that have been used for this prediction task are the Chidamber and Kemerer's object-oriented (CK) metrics [9], [10], [11], structural metrics [12] or process metrics [13].…”

Section: Background and Problem Descriptionmentioning

confidence: 99%

See 1 more Smart Citation

Untitled

Supplemental Information 1: Replication Package

View full text Add to dashboard Cite

Abstract-Research has yielded approaches for predicting future changes and defects in software artifacts, based on historical information, helping developers in effectively allocating their (limited) resources. Developers are unlikely able to focus on all predicted software artifacts, hence the ordering of predictions is important for choosing the right artifacts to concentrate on. We propose using a Genetic Algorithm (GA) for tailoring prediction models to prioritize classes with more changes/defects. We evaluate the approach on two models, regression tree and linear regression, predicting changes/defects between multiple releases of eight open source projects. Our results show that regression models calibrated by GA significantly outperform their traditional counterparts, improving the ranking of classes with more changes/defects by up to 48%. In many cases the top 10% of predicted classes can contain up to twice as many changes or defects.

show abstract

An under‐sampled software defect prediction method based on hybrid multi‐objective cuckoo search

Cai

Niu

Geng

et al. 2019

Concurrency and Computation

225

211

View full text Add to dashboard Cite

Both the problem of class imbalance in datasets and parameter selection of Support Vector Machine (SVM) are crucial to predict software defects. However, there is no one working to solve these problems synchronously at present. To tackle this problem, a hybrid multi-objective cuckoo search under-sampled software defect prediction model based on SVM (HMOCS-US-SVM) is proposed to solve synchronously above two problems. Firstly, a hybrid multi-objective cuckoo search with dynamical local search (HMOCS) is utilized to select synchronously the non-defective sampling and optimize the parameters of SVM. Then, three under-sampled methods for decision region range are proposed to select the non-defective modules. In the simulation, the three indicators, including the false positive rate (pf), the probability of detection (pd), and G-mean, are employed to measure the performance of the proposed algorithm. In addition, eight datasets from Promise database are selected to verify the proposed software defect predication model.Comparing with the result of eight prediction models, the proposed method comes into effect on solving software defect prediction problem. KEYWORDSclass imbalance, hybrid multi-objective cuckoo search, software defect prediction, SVM, under-sampled INTRODUCTIONWith the advancement of network society, the software has been applied widely in the areas of life, such as the banking systems, biopharmaceutical engineering, and traffic signal command. Therefore, an increasing number of attention has been paid to the quality of software products. 1Generally speaking, software quality mainly includes five aspects: reliability, understandability, availability, maintainability, and effectiveness. 2 It is specially said that the reliability plays an important factor in leading to the software defects. 3Software defects are the errors in the software development, which will lead to faults, failure, collapse, and even endanger the safety of human life and property. 4 Therefore, how to find defects as much as possible is particularly important. The core of software defect prediction (SDP) 5 is to extract the characteristic attributes as the obvious defect tendency of the historical software module, so as to predict the type or number of defects in the new software projects.Class imbalance (CIB) in datasets is an unavoidable problem in SDP, which shows that 80% of the defects are concentrated on 20% of the modules. 6 However, the traditional classification algorithm 7 is built on the relative balance of datasets, which not suitable for imbalanced datasets. It does mean that the classification algorithm is more inclined to the non-defected module. 8 Therefore, how to alleviate the imbalance of datasets is a major problem in SDP. To tackle the CIB problem, the existing research can be roughly divided into cost-sensitive method, 9 ensemble method, 10 and sampling method. 11• Cost-sensitive algorithms 12 solve the imbalanced problems by modifying algorithms, which means that the method improves the accuracy of classificatio...

show abstract

A Systematic Literature Review on Fault Prediction Performance in Software Engineering

Cited by 901 publications

References 212 publications

Cost Effectiveness of Software Defect Prediction in an Industrial Project

Cost Effectiveness of Software Defect Prediction in an Industrial Project

Untitled

An under‐sampled software defect prediction method based on hybrid multi‐objective cuckoo search

Contact Info

Product

Resources

About