The Impact of Class Rebalancing Techniques on the Performance and Interpretation of Defect Prediction Models

Tantithamthavorn, Chakkrit; Hassan, Ahmed E.; Matsumoto, Kenichi

doi:10.1109/tse.2018.2876537

Cited by 218 publications

(158 citation statements)

References 85 publications

Supporting

Mentioning

142

Contrasting

Order By: Relevance

“…PatchNet achieves an average AUC of 0.808. Since the new five test sets are highly imbalanced (only 15.79% patches are stable patches), we omit the other metrics (i.e., accuracy, precision, recall, and F1) [49], [53], [60]. We also trained PatchNet on a whole training dataset (i.e., 42,408 stable patches and 39,995 non-stable patches) and evaluated it on 184,481 non-stable patches.…”

Section: Rq2: How Effective Is Patchnet Compared To Other State-of-thmentioning

confidence: 99%

See 1 more Smart Citation

PatchNet: Hierarchical Deep Learning-Based Stable Patch Identification for the Linux Kernel

Hoang

Lawall²,

Tian³

et al. 2021

IIEEE Trans. Software Eng.

View full text Add to dashboard Cite

Linux kernel stable versions serve the needs of users who value stability of the kernel over new features. The quality of such stable versions depends on the initiative of kernel developers and maintainers to propagate bug fixing patches to the stable versions. Thus, it is desirable to consider to what extent this process can be automated. A previous approach relies on words from commit messages and a small set of manually constructed code features. This approach, however, shows only moderate accuracy. In this paper, we investigate whether deep learning can provide a more accurate solution. We propose PatchNet, a hierarchical deep learning-based approach capable of automatically extracting features from commit messages and commit code and using them to identify stable patches. PatchNet contains a deep hierarchical structure that mirrors the hierarchical and sequential structure of commit code, making it distinctive from the existing deep learning models on source code. Experiments on 82,403 recent Linux patches confirm the superiority of PatchNet against various state-of-the-art baselines, including the one recently-adopted by Linux kernel maintainers.

show abstract

Section: Rq2: How Effective Is Patchnet Compared To Other State-of-thmentioning

confidence: 99%

“…PatchNet achieves an AUC score of 0.805. Again we only use AUC as this dataset is highly imbalanced [49], [53], [60].…”

Section: Rq2: How Effective Is Patchnet Compared To Other State-of-thmentioning

confidence: 99%

PatchNet: Hierarchical Deep Learning-Based Stable Patch Identification for the Linux Kernel

Hoang

Lawall²,

Tian³

et al. 2021

IIEEE Trans. Software Eng.

View full text Add to dashboard Cite

show abstract

“…Different subsets of metrics among training samples may pose a critical threat to validity when analysing and identifying the most important metrics. For example, prior work often applies posthoc multiple comparison analyses (e.g., a Scott-Knott test) on the distributions of importance scores to identify statistical distinct ranks of the most important metrics [34,71,80]. Thus, such post-hoc analyses cannot be applied when feature selection techniques produce different subsets of metrics.…”

Section: Case Study Resultsmentioning

confidence: 99%

“…Finally, RFE provides the subset of metrics which yields the best performance according to an evaluation criterion (e.g., AUC). In our study, we select the AUC measure since it measures the discriminatory power of [23,44,60,71]. We use the implementation of the recursive feature elimination using the rfe function as provided by the caret R package [43].…”

Section: B Wrapper-based Feature Selection Techniquesmentioning

confidence: 99%

See 1 more Smart Citation

AutoSpearman: Automatically Mitigating Correlated Software Metrics for Interpreting Defect Models

Jiarpakdee

Tantithamthavorn

2018

2018 IEEE International Conference on Software Maintenance and Evolution (ICSME)

Self Cite

View full text Add to dashboard Cite

The interpretation of defect models heavily relies on software metrics that are used to construct them. However, such software metrics are often correlated in defect models. Prior work often uses feature selection techniques to remove correlated metrics in order to improve the performance of defect models. Yet, the interpretation of defect models may be misleading if feature selection techniques produce subsets of inconsistent and correlated metrics. In this paper, we investigate the consistency and correlation of the subsets of metrics that are produced by nine commonly-used feature selection techniques. Through a case study of 13 publicly-available defect datasets, we find that feature selection techniques produce inconsistent subsets of metrics and do not mitigate correlated metrics, suggesting that feature selection techniques should not be used and correlation analyses must be applied when the goal is model interpretation. Since correlation analyses often involve manual selection of metrics by a domain expert, we introduce AutoSpearman, an automated metric selection approach based on correlation analyses. Our evaluation indicates that AutoSpearman yields the highest consistency of subsets of metrics among training samples and mitigates correlated metrics, while impacting model performance by 1-2%pts. Thus, to automatically mitigate correlated metrics when interpreting defect models, we recommend future studies use AutoSpearman in lieu of commonly-used feature selection techniques.

show abstract

Improving deep‐learning‐based fault localization with resampling

Zhang

Lei

Mao

et al. 2020

J Software Evolu Process

View full text Add to dashboard Cite

Many fault localization approaches recently utilize deep learning to learn an effective localization model showing a fresh perspective with promising results. However, localization models are generally learned from class imbalance datasets; that is, the number of failing test cases is much fewer than passing test cases. It may be highly susceptible to affect the accuracy of learned localization models. Thus, in this paper, we explore using data resampling to reduce the negative effect of the imbalanced class problem and improve the accuracy of learned models of deep‐learning‐based fault localization. Specifically, for deep‐learning‐based fault localization, its learning feature may require duplicate essential data to enhance the weak but beneficial experience incurred by the class imbalance datasets. We leverage the property of test cases (i.e., passing or failing) to identify failing test cases as the duplicate essential data and propose an iterative oversampling approach to resample failing test cases for producing a class balanced test suite. We apply the test case resampling to representative localization models using deep learning. Our empirical results on eight large‐sized programs with real faults and four large‐sized programs with seeded faults show that the test case resampling significantly improves fault localization effectiveness.

show abstract

The Impact of Class Rebalancing Techniques on the Performance and Interpretation of Defect Prediction Models

Cited by 218 publications

References 85 publications

PatchNet: Hierarchical Deep Learning-Based Stable Patch Identification for the Linux Kernel

PatchNet: Hierarchical Deep Learning-Based Stable Patch Identification for the Linux Kernel

AutoSpearman: Automatically Mitigating Correlated Software Metrics for Interpreting Defect Models

Improving deep‐learning‐based fault localization with resampling

Contact Info

Product

Resources

About