Applying Novel Resampling Strategies To Software Defect Prediction

Pelayo, L.; Dick, Scott

doi:10.1109/nafips.2007.383813

Cited by 129 publications

(70 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…As a result, except for dynamic features (i.e., the features collected through executing the original program), we also adopt some static code information as features to indicate dynamic mutant execution information. Such information could complement dynamic features for precise prediction of software engineering problems, which is also replete in the literature [37], [38], [39]. Actually, the entire static analysis area in the end aims to guarantee that dynamic program executions satisfy certain properties.…”

Section: Identified Featuresmentioning

confidence: 99%

Predictive Mutation Testing

Zhang

Harman

et al. 2019

IIEEE Trans. Software Eng.

114

110

View full text Add to dashboard Cite

Abstract-Test suites play a key role in ensuring software quality. A good test suite may detect more faults than a poor-quality one. Mutation testing is a powerful methodology for evaluating the fault-detection ability of test suites. In mutation testing, a large number of mutants may be generated and need to be executed against the test suite under evaluation to check how many mutants the test suite is able to detect, as well as the kind of mutants that the current test suite fails to detect. Consequently, although highly effective, mutation testing is widely recognized to be also computationally expensive, inhibiting wider uptake. To alleviate this efficiency concern, we propose Predictive Mutation Testing (PMT): the first approach to predicting mutation testing results without executing mutants. In particular, PMT constructs a classification model, based on a series of features related to mutants and tests, and uses the model to predict whether a mutant would be killed or remain alive without executing it. PMT has been evaluated on 163 real-world projects under two application scenarios (cross-version and cross-project). The experimental results demonstrate that PMT improves the efficiency of mutation testing by up to 151.4X while incurring only a small accuracy loss. It achieves above 0.80 AUC values for the majority of projects, indicating a good tradeoff between the efficiency and effectiveness of predictive mutation testing. Also, PMT is shown to perform well on different tools and tests, be robust in the presence of imbalanced data, and have high predictability (over 60% confidence) when predicting the execution results of the majority of mutants.

show abstract

Section: Identified Featuresmentioning

confidence: 99%

Predictive Mutation Testing

Zhang

Harman

et al. 2019

IIEEE Trans. Software Eng.

114

110

View full text Add to dashboard Cite

show abstract

“…Resampling is an effective way to mitigate the effects of imbalanced data in change classification [18], [38]. Different algorithms are used to change the distribution between the majority class and the minority class.…”

Section: Resamplingmentioning

confidence: 99%

“…When the buggy rate is low, it is challenging to learn accurate models because there are fewer positive instances (i.e., buggy changes) for learning. Classifying imbalanced data is a known open challenge [18].…”

Section: Introductionmentioning

confidence: 99%

Online Defect Prediction for Imbalanced Data

Tan

Dara³

et al. 2015

2015 IEEE/ACM 37th IEEE International Conference on Software Engineering

223

198

View full text Add to dashboard Cite

Abstract-Many defect prediction techniques are proposed to improve software reliability. Change classification predicts defects at the change level, where a change is the modifications to one file in a commit. In this paper, we conduct the first study of applying change classification in practice.We identify two issues in the prediction process, both of which contribute to the low prediction performance. First, the data are imbalanced-there are much fewer buggy changes than clean changes. Second, the commonly used cross-validation approach is inappropriate for evaluating the performance of change classification. To address these challenges, we apply and adapt online change classification, resampling, and updatable classification techniques to improve the classification performance.We perform the improved change classification techniques on one proprietary and six open source projects. Our results show that these techniques improve the precision of change classification by 12.2-89.5% or 6.4-34.8 percentage points (pp.) on the seven projects. In addition, we integrate change classification in the development process of the proprietary project. We have learned the following lessons: 1) new solutions are needed to convince developers to use and believe prediction results, and prediction results need to be actionable, 2) new and improved classification algorithms are needed to explain the prediction results, and insensible and unactionable explanations need to be filtered or refined, and 3) new techniques are needed to improve the relatively low precision.

show abstract

“…Since the standard classifiers are not applicable for imbalanced data, in order to deal with the problem of imbalanced data in this study, SMOTE was employed. This technique was proposed by Chawla et al [3] which is a famous re-sampling method in data pre-processing and has been applied in several articles, such as Pelayo and Dick [4], Zhao et al [5], Gu et al [6]. Using SMOTE technique, the number of samples in minority class can be increased by creating new synthetic samples instead of repeating them, so that the over-fitting problem in learning algorithm is avoided.…”

Section: Introductionmentioning

confidence: 99%

An Efficient Method for Predicting the 5-year Survivability of Breast Cancer

Jahanbazi¹,

Nadimi²

2016

IJCA

View full text Add to dashboard Cite

Breast cancer is one of the most sever type of cancers and is the most common cause of death among the female cancer patients. In order to ease the process of decision making and financial arrangements, it is essential to be aware of survivability of patients. In recent years, effective data-mining techniques have been employed to predict the 5-year survivability of cancer patients, showing reasonable accuracy. The efficiency of these models can be improved by making them accessible on smartphones. In order to achieve this, it is essential to reduce the maximum required memory occupied by the prediction models, since a smartphone has a limited available memory. This issue, which is still an open area of research, is the concern of the present study. A hybrid method is enhanced by combining synthetic minority over-sampling technique (SMOTE), information gain attribute evaluation (InfoGainAttributeEval), AdaBoost.M1 algorithm and a decision tree. The more effective attributes are selected using InfoGainAttributeEval and the less effective nodes are removed by decision tree pre-pruning during the tree building. The hybrid method is further simplified by employing the post-pruning technique on the decision tree after its creation. The proposed method was subjected to a 5-year cancer survivability dataset, showing considerable reduction in the maximum required memory while maintaining the accuracy of prediction.

show abstract

Applying Novel Resampling Strategies To Software Defect Prediction

Cited by 129 publications

References 16 publications

Predictive Mutation Testing

Predictive Mutation Testing

Online Defect Prediction for Imbalanced Data

An Efficient Method for Predicting the 5-year Survivability of Breast Cancer

Contact Info

Product

Resources

About