Are Bug Reports Enough for Text Retrieval-Based Bug Localization?

Mills, Chris; Pantiuchina, Jevgenija; Parra, Esteban; Bavota, Gabriele; Haiduc, Sonia

doi:10.1109/icsme.2018.00046

Cited by 36 publications

(56 citation statements)

References 34 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Herbold et al (2020) independently confirmed the results by Herzig et al (2013) and demonstrated how this and other issues negatively impact defect prediction data. However, while both Herzig et al (2013) and Herbold et al (2020) study the impact of mislabels of defect prediction, any software repository mining research that studies defects suffers from similar consequences, e.g., bug localization (e.g., Marcus et al 2004, Lukins et al 2008, Rao and Kak 2011, Mills et al 2018. In the literature, there are several approaches that try to address the issue of mislabels in issue systems through machine learning.…”

Section: Related Workmentioning

confidence: 99%

“…If a feature request is misclassified as bug this may hold up a release. Second, there are many Mining Software Repositories (MSR) approaches that rely on issue types, especially the issue type bug, e.g., for bug localization (e.g., Marcus et al 2004, Lukins et al 2008, Rao and Kak 2011, Mills et al 2018 or the labeling of commits as defective with the SZZ algorithm (Śliwerski et al 2005) and the subsequent use of these labels, e.g., for defect prediction (e.g., Hall et al 2012, Hosseini et al 2017 or the creation of fine-grained data (e.g., Just et al 2014). Mislabelled issues threaten the validity of the research and would also degenerate the performance of approaches based on this data that are implemented in tools and used by practitioners.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

On the feasibility of automated prediction of bug and non-bug issues

Herbold

Trautsch

2020

Empir Software Eng

View full text Add to dashboard Cite

Context Issue tracking systems are used to track and describe tasks in the development process, e.g., requested feature improvements or reported bugs. However, past research has shown that the reported issue types often do not match the description of the issue. Objective We want to understand the overall maturity of the state of the art of issue type prediction with the goal to predict if issues are bugs and evaluate if we can improve existing models by incorporating manually specified knowledge about issues. Method We train different models for the title and description of the issue to account for the difference in structure between these fields, e.g., the length. Moreover, we manually detect issues whose description contains a null pointer exception, as these are strong indicators that issues are bugs. Results Our approach performs best overall, but not significantly different from an approach from the literature based on the fastText classifier from Facebook AI Research. The small improvements in prediction performance are due to structural information about the issues we used. We found that using information about the content of issues in form of null pointer exceptions is not useful. We demonstrate the usefulness of issue type prediction through the example of labelling bugfixing commits. Conclusions Issue type prediction can be a useful tool if the use case allows either for a certain amount of missed bug reports or the prediction of too many issues as bug is acceptable.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

On the feasibility of automated prediction of bug and non-bug issues

Herbold

Trautsch

2020

Empir Software Eng

View full text Add to dashboard Cite

show abstract

“…We can simplify Equation (50) analogously to Equation (47) and get a constant cost model with a 1-to-m mapping between artifacts and defects…”

Section: Initializations Of the Cost Modelmentioning

confidence: 99%

On the Costs and Profit of Software Defect Prediction

Herbold

2021

IIEEE Trans. Software Eng.

View full text Add to dashboard Cite

Defect prediction can be a powerful tool to guide the use of quality assurance resources. However, while lots of research covered methods for defect prediction as well as methodological aspects of defect prediction research, the actual cost saving potential of defect prediction is still unclear. Within this article, we close this research gap and formulate a cost model for software defect prediction. We derive mathematically provable boundary conditions that must be fulfilled by defect prediction models such that there is a positive profit when the defect prediction model is used. Our cost model includes aspects like the costs for quality assurance, the costs of post-release defects, the possibility that quality assurance fails to reveal predicted defects, and the relationship between software artifacts and defects. We initialize the cost model using different assumptions, perform experiments to show trends of the behavior of costs on real projects. Our results show that the unrealistic assumption that defects only affect a single software artifact, which is a standard practice in the defect prediction literature, leads to inaccurate cost estimations. Moreover, the results indicate that thresholds for machine learning metrics are also not suited to define success criteria for software defect prediction.Index Terms-Defect prediction, costs, return on investment ! • S. Herbold is with the

show abstract

“…This shows that in order to resolve ≈ 40% of bug report, a developer has to fix code in more than one source file, which is not uncommon [32,22]. It is also known that not all files in a bug-fixing commit may be directly related to that bug report [36]. However, the authors of the original study took all files in a bug-fixing commit.…”

Section: Data Preprocessingmentioning

confidence: 99%

“…There also exists a small percentage (≈ 0.07%) of outlier files associated with more than a 100 bugs. Note that this does not necessarily imply that these files are error-prone: they may not be related to an actual fix [36], as discussed above. Using the same rationale as in the previous paragraph, we retain all of the files.…”

Section: Data Preprocessingmentioning

confidence: 99%

On Usefulness of the Deep-Learning-Based Bug Localization Models to Practitioners

Polisetty

Miranskyy

Başar

2019

Proceedings of the Fifteenth International Conference on Predictive Models and Data Analytics in Software Engineering

View full text Add to dashboard Cite

Background : Developers spend a significant amount of time and efforts to localize bugs. In the literature, many researchers proposed state-of-the-art bug localization models to help developers localize bugs easily. The practitioners, on the other hand, expect a bug localization tool to meet certain criteria, such as trustworthiness, scalability, and efficiency. The current models are not capable of meeting these criteria, making it harder to adopt these models in practice. Recently, deep-learning-based bug localization models have been proposed in the literature, which show a better performance than the state-of-the-art models.Aim: In this research, we would like to investigate whether deep learning models meet the expectations of practitioners or not.Method : We constructed a Convolution Neural Network and a Simple Logistic model to examine their effectiveness in localizing bugs. We train these models on five open source projects written in Java and compare their performance with the performance of other state-of-the-art models trained on these datasets.Results: Our experiments show that although the deep learning models perform better than classic machine learning models, they meet the adoption criteria set by the practitioners only partially.Conclusions: This work provides evidence that the practitioners should be cautious while using the current state of the art models for production-level use-cases. It also highlights the need for standardization of performance benchmarks to ensure that bug localization models are assessed equitably and realistically.

show abstract

Are Bug Reports Enough for Text Retrieval-Based Bug Localization?

Cited by 36 publications

References 34 publications

On the feasibility of automated prediction of bug and non-bug issues

On the feasibility of automated prediction of bug and non-bug issues

On the Costs and Profit of Software Defect Prediction

On Usefulness of the Deep-Learning-Based Bug Localization Models to Practitioners

Contact Info

Product

Resources

About