2022
DOI: 10.1109/tse.2020.2986415
|View full text |Cite
|
Sign up to set email alerts
|

Better Data Labelling With EMBLEM (and how that Impacts Defect Prediction)

Abstract: Standard automatic methods for recognizing problematic development commits can be greatly improved via the incremental application of human+artificial expertise. In this approach, called EMBLEM, an AI tool first explore the software development process to label commits that are most problematic. Humans then apply their expertise to check those labels (perhaps resulting in the AI updating the support vectors within their SVM learner). We recommend this human+AI partnership, for several reasons. When a new domai… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
57
1

Year Published

2022
2022
2024
2024

Publication Types

Select...
6
2

Relationship

1
7

Authors

Journals

citations
Cited by 34 publications
(60 citation statements)
references
References 111 publications
0
57
1
Order By: Relevance
“…Davies et al [13] AG-SZZ [14] Manually defined (researchers) 3 174 MA-SZZ da Costa et al [14] AG-SZZ [6], [9], [10], [15], [16], [38] Automatically computed metrics 10 2,637 RA-SZZ Neto et al [15] MA-SZZ [5], [6], [15] Manually defined (researchers) For each one, we specify (i) the algorithm on which it is based, (ii) references of works using it, (iii) the oracle used in the evaluation (how it was built, number of projects and bug fixes considered).…”
Section: L-szz and R-szzmentioning
confidence: 99%
See 1 more Smart Citation
“…Davies et al [13] AG-SZZ [14] Manually defined (researchers) 3 174 MA-SZZ da Costa et al [14] AG-SZZ [6], [9], [10], [15], [16], [38] Automatically computed metrics 10 2,637 RA-SZZ Neto et al [15] MA-SZZ [5], [6], [15] Manually defined (researchers) For each one, we specify (i) the algorithm on which it is based, (ii) references of works using it, (iii) the oracle used in the evaluation (how it was built, number of projects and bug fixes considered).…”
Section: L-szz and R-szzmentioning
confidence: 99%
“…For example, Aman et al [9] studied the role of local variable names in fault-introducing commits and they used SZZ to retrieve such commits, while Palomba et al [17] focused on the impact of code smells, and used SZZ to determine whether an artifact was smelly when a fault was introduced. Many studies also leverage SZZ to evaluate defect prediction approaches [2]- [6], [19], [21], [26], [34], [38].…”
Section: B Szz In Software Engineering Researchmentioning
confidence: 99%
“…In this work, we aim to better data generation associated with building models for SATDs identification by reducing the labeling effort come from manual method [67] and improving the labeling quality of fully automated methods [84]. Moreover, our investigation also showed that the effort-aware method we propose, DebtFree, also performs statistically similar or even better than two SOTA works [84,55].…”
mentioning
confidence: 76%
“…Fig. 1: Workflows of DebtFree = Pseudo-Labeling (via Unsupervised Learning, i.e., CLA [47]) + Filtering (via CLA [47] or Jitterbug's Easy [84]) + Active Learning (via Emblem [67], Jitterbug's Hard [84], or this study's Falcon).…”
mentioning
confidence: 99%
“…They found that the model trained on data labeled by B-SZZ and MA-SZZ does not cause a considerable reduction in accuracy but AG-SZZ causes a considerable decline in the accuracy. Tu et al [7] worked on better data labeling using a proposed approach known as EMBLEM. In this approach first AI labels the data and then human experts check the labels which in turn updates the AI.…”
Section: Previous Work 21 Just-in-time Software Defect Predictionmentioning
confidence: 99%