Automatic Defect Categorization

Thung, Ferdian; Lo, David; Jiang, Lingxiao

doi:10.1109/wcre.2012.30

Cited by 103 publications

(74 citation statements)

References 31 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…These 500 defects are manually labeled by Thung et al [33]. Our experiment shows that the effectiveness of our proposed approach is promising.…”

Section: Introductionmentioning

confidence: 76%

“…In this paper, we extend the defect categorization work by Thung et al [33]. In that work, defects are categorized into three families: control and data flow, structural, and non-code 1 .…”

Section: Introductionmentioning

confidence: 91%

“…In this work, we focus on the defect families defined by Thung et al [33]. There are three defect families: control and data flow, structural, and non-code.…”

Section: A Defect Classificationmentioning

confidence: 99%

“…Thung et al consider the 3 defect families rather than 7 original ODC defect types since building a machine learning solution that accurately classifies defect families into 7 types are much harder than one that can accurately classifies defects into 3 families [33]. A multi-class classification problem gets much harder as the number of classes (in our case, defect types) increases.…”

Section: A Defect Classificationmentioning

confidence: 99%

“…Thus, several studies have been proposed a number of approaches which automate the defects categorization process [14], [33]. These approaches typically rely on supervised learning in which a portion of the defects are manually labelled, and these are input to a machine learning technique to learn a discriminative model, which is then used to automatically classify other unlabelled defects.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Active Semi-supervised Defect Categorization

Thung

2015

2015 IEEE 23rd International Conference on Program Comprehension

Self Cite

View full text Add to dashboard Cite

Abstract-Defects are inseparable part of software development and evolution. To better comprehend problems affecting a software system, developers often store historical defects and these defects can be categorized into families. IBM proposes Orthogonal Defect Categorization (ODC) which include various classifications of defects based on a number of orthogonal dimensions (e.g., symptoms and semantics of defects, root causes of defects, etc.). To help developers categorize defects, several approaches that employ machine learning have been proposed in the literature. Unfortunately, these approaches often require developers to manually label a large number of defect examples. In practice, manually labelling a large number of examples is both time-consuming and labor-intensive. Thus, reducing the onerous burden of manual labelling while still being able to achieve good performance is crucial towards the adoption of such approaches. To deal with this challenge, in this work, we propose an active semi-supervised defect prediction approach. It is performed by actively selecting a small subset of diverse and informative defect examples to label (i.e., active learning), and by making use of both labeled and unlabeled defect examples in the prediction model learning process (i.e., semi-supervised learning). Using this principle, our approach is able to learn a good model while minimizing the manual labeling effort.To evaluate the effectiveness of our approach, we make use of a benchmark dataset that contains 500 defects from three software systems that have been manually labelled into several families based on ODC. We investigate our approach's ability in achieving good classification performance, measured in terms of weighted precision, recall, F-measure, and AUC, when only a small number of manually labelled defect examples are available. Our experiment results show that our active semi-supervised defect categorization approach is able to achieve a weighted precision, recall, F-measure, and AUC of 0.651, 0.669, 0.623, and 0.710, respectively, when only 50 defects are manually labelled. Furthermore, it outperforms an existing active multiclass classification algorithm, proposed in the machine learning community, by a substantial margin.

show abstract

“…These 500 defects are manually labeled by Thung et al [33]. Our experiment shows that the effectiveness of our proposed approach is promising.…”

Section: Introductionmentioning

confidence: 76%

“…In this paper, we extend the defect categorization work by Thung et al [33]. In that work, defects are categorized into three families: control and data flow, structural, and non-code 1 .…”

Section: Introductionmentioning

confidence: 91%

“…In this work, we focus on the defect families defined by Thung et al [33]. There are three defect families: control and data flow, structural, and non-code.…”

Section: A Defect Classificationmentioning

confidence: 99%

Section: A Defect Classificationmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Active Semi-supervised Defect Categorization

Thung

2015

2015 IEEE 23rd International Conference on Program Comprehension

Self Cite

View full text Add to dashboard Cite

show abstract

SQA – Definitions and Concepts

Galin¹

2018

Software Quality: Concepts and Practice

View full text Add to dashboard Cite

BUGSJS: a benchmark and taxonomy of JavaScript bugs

Gyimesi

Vancsics

Stocco

et al. 2020

Software Testing Verif & Rel

View full text Add to dashboard Cite

JavaScript is a popular programming language that is also error-prone due to its asynchronous, dynamic, and loosely typed nature. In recent years, numerous techniques have been proposed for analyzing and testing JavaScript applications. However, our survey of the literature in this area revealed that the proposed techniques are often evaluated on different datasets of programs and bugs. The lack of a commonly used benchmark limits the ability to perform fair and unbiased comparisons for assessing the efficacy of new techniques. To fill this gap, we propose BUGSJS, a benchmark of 453 real, manually validated JavaScript bugs from 10 popular JavaScript server-side programs, comprising 444k lines of code (LOC) in total. Each bug is accompanied by its bug report, the test cases that expose it, as well as the patch that fixes it. We extended BUGSJS with a rich web interface for visualizing and dissecting the bugs' information, as well as a programmable API to access the faulty and fixed versions of the programs and to execute the corresponding test cases, which facilitates conducting highly reproducible empirical studies and comparisons of JavaScript analysis and testing tools. Moreover, following a rigorous procedure, we performed a classification of the bugs according to their nature. Our internal validation shows that our taxonomy is adequate for characterizing the bugs in BUGSJS. We discuss several ways in which the resulting taxonomy and the benchmark can help direct researchers interested in automated testing of JavaScript applications.

show abstract

Automatic Defect Categorization

Cited by 103 publications

References 31 publications

Active Semi-supervised Defect Categorization

Active Semi-supervised Defect Categorization

SQA – Definitions and Concepts

BUGSJS: a benchmark and taxonomy of JavaScript bugs

Contact Info

Product

Resources

About