How to “DODGE” Complex Software Analytics

Agrawal, Amritanshu; Wei, Fu; Chen, Di; Shen, Xipeng; Menzies, Tim

doi:10.1109/tse.2019.2945020

Cited by 42 publications

(56 citation statements)

References 72 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…learning on static code attributes such as C.K. and McGabe metrics) [1], [3], [18], [20], [33], [36], [44], [57], [64], [65], [72], [74], [80], [96] that are more granulated and high-dimensional.…”

Section: Future Workmentioning

confidence: 99%

See 1 more Smart Citation

Better Data Labelling With EMBLEM (and how that Impacts Defect Prediction)

2022

IIEEE Trans. Software Eng.

Self Cite

View full text Add to dashboard Cite

Standard automatic methods for recognizing problematic development commits can be greatly improved via the incremental application of human+artificial expertise. In this approach, called EMBLEM, an AI tool first explore the software development process to label commits that are most problematic. Humans then apply their expertise to check those labels (perhaps resulting in the AI updating the support vectors within their SVM learner). We recommend this human+AI partnership, for several reasons. When a new domain is encountered, EMBLEM can learn better ways to label which comments refer to real problems. Also, in studies with 9 open source software projects, labelling via EMBLEM's incremental application of human+AI is at least an order of magnitude cheaper than existing methods (≈ eight times). Further, EMBLEM is very effective. For the data sets explored here, EMBLEM better labelling methods significantly improved Popt20 and G-score performance in nearly all the projects studied here. TABLE 1This paper argues against using keywords like these as a method for labelling a commit as "buggy'.

show abstract

Section: Future Workmentioning

confidence: 99%

“…Finally, data mining technology keeps evolving. Agrawal et al [1] recently argued that for any dataset where FFTs are effective, that there is a better algorithm (that they call DODGE( )). Moreover, Yang et al [109] designed the deep belief network to generate more quality metrics from the given metrics by Kamei et al [48].…”

Section: Future Workmentioning

confidence: 99%

Better Data Labelling With EMBLEM (and how that Impacts Defect Prediction)

2022

IIEEE Trans. Software Eng.

Self Cite

View full text Add to dashboard Cite

show abstract

“…Our preliminary analysis shows that the model building with the smallest defect dataset still takes longer than two days. 3 Hence, using semantic features for line-level defect prediction remains challenging.…”

Section: Challenges In Machine Learning-based Approachesmentioning

confidence: 99%

“…) [3,4]. A d2h value of 0 indicates that an approach achieves a perfect identification, i.e., an approach can identify all defective lines (Recall = 1) without any false positives (FAR = 0).…”

Section: Evaluation Measuresmentioning

confidence: 99%

Predicting Defective Lines Using a Model-Agnostic Technique

Wattanakriengkrai

Thongtanunam

Tantithamthavorn

et al. 2022

IIEEE Trans. Software Eng.

View full text Add to dashboard Cite

Defect prediction models are proposed to help a team prioritize source code areas files that need Software Quality Assurance (SQA) based on the likelihood of having defects. However, developers may waste their unnecessary effort on the whole file while only a small fraction of its source code lines are defective. Indeed, we find that as little as 1%-3% of lines of a file are defective. Hence, in this work, we propose a novel framework (called LINE-DP) to identify defective lines using a model-agnostic technique, i.e., an Explainable AI technique that provides information why the model makes such a prediction. Broadly speaking, our LINE-DP first builds a file-level defect model using code token features. Then, our LINE-DP uses a state-of-the-art model-agnostic technique (i.e., LIME) to identify risky tokens, i.e., code tokens that lead the file-level defect model to predict that the file will be defective. Then, the lines that contain risky tokens are predicted as defective lines. Through a case study of 32 releases of nine Java open source systems, our evaluation results show that our LINE-DP achieves an average recall of 0.61, a false alarm rate of 0.47, a top 20%LOC recall of 0.27, and an initial false alarm of 16, which are statistically better than six baseline approaches. Our evaluation shows that our LINE-DP requires an average computation time of 10 seconds including model construction and defective identification time. In addition, we find that 63% of defective lines that can be identified by our LINE-DP are related to common defects (e.g., argument change, condition change). These results suggest that our LINE-DP can effectively identify defective lines that contain common defects while requiring a smaller amount of inspection effort and a manageable computation cost. The contribution of this paper builds an important step towards line-level defect prediction by leveraging a model-agnostic technique.

show abstract

“…Perhaps if we augmented early life cycle defect predictors with a little transfer learning (from other projects [43]), then we could generate better performing predictors. « Further to the last point, another interesting avenue of future research might be hyper-parameter optimization (HPO) [20], [80], [81]. HPO is often not applied in software analytics due to its computational complexity.…”

Section: III C O N C L U S I O Nmentioning

confidence: 99%

Early Life Cycle Software Defect Prediction. Why? How?

Shrikanth

Majumder

2021

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)

Self Cite

View full text Add to dashboard Cite

Many researchers assume that, for software analytics, "more data is better." We write to show that, at least for learning defect predictors, this may not be true.To demonstrate this, we analyzed hundreds of popular GitHub projects. These projects ran for 84 months and contained 3,728 commits (median values). Across these projects, most of the defects occur very early in their life cycle. Hence, defect predictors learned from the first 150 commits and four months perform just as well as anything else. This means that, at least for the projects studied here, after the first few months, we need not continually update our defect prediction models.We hope these results inspire other researchers to adopt a "simplicity-first" approach to their work. Some domains require a complex and data-hungry analysis. But before assuming complexity, it is prudent to check the raw data looking for "short cuts" that can simplify the analysis.

show abstract

How to “DODGE” Complex Software Analytics

Cited by 42 publications

References 72 publications

Better Data Labelling With EMBLEM (and how that Impacts Defect Prediction)

Better Data Labelling With EMBLEM (and how that Impacts Defect Prediction)

Predicting Defective Lines Using a Model-Agnostic Technique

Early Life Cycle Software Defect Prediction. Why? How?

Contact Info

Product

Resources

About