Better Data Labelling With EMBLEM (and how that Impacts Defect Prediction)

Tu, Huy; Yu, Zhe

doi:10.1109/tse.2020.2986415

Cited by 34 publications

(60 citation statements)

References 111 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Davies et al [13] AG-SZZ [14] Manually defined (researchers) 3 174 MA-SZZ da Costa et al [14] AG-SZZ [6], [9], [10], [15], [16], [38] Automatically computed metrics 10 2,637 RA-SZZ Neto et al [15] MA-SZZ [5], [6], [15] Manually defined (researchers) For each one, we specify (i) the algorithm on which it is based, (ii) references of works using it, (iii) the oracle used in the evaluation (how it was built, number of projects and bug fixes considered).…”

Section: L-szz and R-szzmentioning

confidence: 99%

“…For example, Aman et al [9] studied the role of local variable names in fault-introducing commits and they used SZZ to retrieve such commits, while Palomba et al [17] focused on the impact of code smells, and used SZZ to determine whether an artifact was smelly when a fault was introduced. Many studies also leverage SZZ to evaluate defect prediction approaches [2]- [6], [19], [21], [26], [34], [38].…”

Section: B Szz In Software Engineering Researchmentioning

confidence: 99%

See 1 more Smart Citation

Evaluating SZZ Implementations Through a Developer-Informed Oracle

Rosa

Pascarella

Scalabrino

et al. 2021

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)

View full text Add to dashboard Cite

The SZZ algorithm for identifying bug-inducing changes has been widely used to evaluate defect prediction techniques and to empirically investigate when, how, and by whom bugs are introduced. Over the years, researchers have proposed several heuristics to improve the SZZ accuracy, providing various implementations of SZZ. However, fairly evaluating those implementations on a reliable oracle is an open problem: SZZ evaluations usually rely on (i) the manual analysis of the SZZ output to classify the identified bug-inducing commits as true or false positives; or (ii) a golden set linking bug-fixing and buginducing commits. In both cases, these manual evaluations are performed by researchers with limited knowledge of the studied subject systems. Ideally, there should be a golden set created by the original developers of the studied systems.We propose a methodology to build a "developer-informed" oracle for the evaluation of SZZ variants. We use Natural Language Processing (NLP) to identify bug-fixing commits in which developers explicitly reference the commit(s) that introduced a fixed bug. This was followed by a manual filtering step aimed at ensuring the quality and accuracy of the oracle. Once built, we used the oracle to evaluate several variants of the SZZ algorithm in terms of their accuracy. Our evaluation helped us to distill a set of lessons learned to further improve the SZZ algorithm.

show abstract

Section: L-szz and R-szzmentioning

confidence: 99%

Section: B Szz In Software Engineering Researchmentioning

confidence: 99%

Evaluating SZZ Implementations Through a Developer-Informed Oracle

Rosa

Pascarella

Scalabrino

et al. 2021

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)

View full text Add to dashboard Cite

show abstract

“…In this work, we aim to better data generation associated with building models for SATDs identification by reducing the labeling effort come from manual method [67] and improving the labeling quality of fully automated methods [84]. Moreover, our investigation also showed that the effort-aware method we propose, DebtFree, also performs statistically similar or even better than two SOTA works [84,55].…”

mentioning

confidence: 76%

“…Fig. 1: Workflows of DebtFree = Pseudo-Labeling (via Unsupervised Learning, i.e., CLA [47]) + Filtering (via CLA [47] or Jitterbug's Easy [84]) + Active Learning (via Emblem [67], Jitterbug's Hard [84], or this study's Falcon).…”

mentioning

confidence: 99%

DebtFree: Minimizing Labeling Cost in Self-Admitted Technical Debt Identification using Semi-Supervised Learning

Tu¹

2022

Preprint

Self Cite

View full text Add to dashboard Cite

Keeping track of and managing Self-Admitted Technical Debts (SATDs) is important for maintaining a healthy software project. Current active-learning SATD recognition tool involves manual inspection of 24% of the test comments on average to reach 90% of the recall. Among all the test comments, about 5% are SATDs. The human experts are then required to read almost a quintuple of the SATD comments which indicates the inefficiency of the tool. Plus, human experts are still prone to error: 95% of the false-positive labels from previous work were actually true positives.To solve the above problems, we propose DebtFree, a two-mode framework based on unsupervised learning for identifying SATDs. In mode1, when the existing training data is unlabeled, DebtFree starts with an unsupervised learner to automatically pseudo-label the programming comments in the training data. In contrasts, in mode2 where labels are available with the corresponding training data, DebtFree starts with a pre-processor that identifies the highly prone SATDs from the test dataset. Then, our machine learning model is employed to assist human experts in manually identifying the remaining SATDs. Our experiments on 10 software projects show that both models yield statistically significant improvement in effectiveness over the state-of-the-art automated and semi-automated models. Specifically, DebtFree can reduce the labeling effort by 99% in mode1 (unlabeled training data), and up to 63% in mode2 (labeled training data) while improving the current active learner's F1 relatively to almost 100%.

show abstract

“…They found that the model trained on data labeled by B-SZZ and MA-SZZ does not cause a considerable reduction in accuracy but AG-SZZ causes a considerable decline in the accuracy. Tu et al [7] worked on better data labeling using a proposed approach known as EMBLEM. In this approach first AI labels the data and then human experts check the labels which in turn updates the AI.…”

Section: Previous Work 21 Just-in-time Software Defect Predictionmentioning

confidence: 99%