In previous work by Alipour et al., a methodology was proposed for detecting duplicate bug reports by comparing the textual content of bug reports to subject-specific contextual material, namely lists of software-engineering terms, such as non-functional requirements and architecture keywords. When a bug report contains a word in these word-list contexts, the bug report is considered to be associated with that context and this information tends to improve bug-deduplication methods.In this paper, we propose a method to partially automate the extraction of contextual word lists from software-engineering literature. Evaluating this software-literature context method on real-world bug reports produces useful results that indicate this semi-automated method has the potential to substantially decrease the manual effort used in contextual bug deduplication while suffering only a minor loss in accuracy.Index Terms-duplicate bug reports; information retrieval; software engineering textbooks; machine learning; software literature; documentation.978-1-4799-8469-5/15/$31.00 c 2015 IEEE SANER 2015, Montréal, Canada
Bug deduplication, ie, recognizing bug reports that refer to the same problem, is a challenging task in the software-engineering life cycle. Researchers have proposed several methods primarily relying on information-retrieval techniques. Our work motivated by the intuition that domain knowledge can provide the relevant context to enhance effectiveness, attempts to improve the use of information retrieval by augmenting with software-engineering knowledge. In our previous work, we proposed the software-literature-context method for using software-engineering literature as a source of contextual information to detect duplicates. If bug reports relate to similar subjects, they have a better chance of being duplicates. Our method, being largely automated, has a potential to substantially decrease the level of manual effort involved in conventional techniques with a minor trade-off in accuracy.In this study, we extend our work by demonstrating that domain-specific features can be applied across projects than project-specific features demonstrated previously while still maintaining performance. We also introduce a hierarchy-of-context to capture the software-engineering knowledge in the realms of contextual space to produce performance gains. We also highlight the importance of domain-specific contextual features through cross-domain contexts: adding context improved accuracy; Kappa scores improved by at least 3.8% to 10.8% per project. KEYWORDS deduplication, documentation, duplicate bug reports, information retrieval, machine learning, software engineering textbooks, software literature INTRODUCTIONModern software projects use issue-tracking systems to record bug/issue reports, a colloquial term for the issues that developers, testers, and users encounter while using a particular software system.Primarily, these tracking systems serve as a store of bug reports, stack traces, and feature requests, and are sometimes used to measure the developers' productivity based on their progress in addressing issues.Bug reports are usually written in natural language; as a result, the same issue can be described in different ways by the project developers and testers and the system users who encounter the issue.Typically the vocabulary used by developers differs from that used by users, and can vary among users depending on their level of technical sophistication. Currently, many projects are forced to use a triager, often an experienced developer, to "translate" bug reports into a more † All of the word lists and bug datasets used in this paper can be found online at: https:// bitbucket.org/kaggarwal32/bug-deduping-dataset technical language, relevant to developers. Duplicate bug reports waste the triager's and developers time. If manual triaging effort could be reduced, developer productivity would be increased, as they would not have to consider multiple reports for the same bug and they would have more information about each bug report, enabling them to fix each bug faster. Considerable research has been done on automated metho...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.