Context: Software projects rely on their issue tracking systems to guide maintenance activities of software developers. Bug reports submitted to the issue tracking systems carry crucial information about the nature of the crash (such as texts from users or developers and execution information about the running functions before the occurrence of a crash). Typically, big software projects receive thousands of reports every day.Objective: The aim is to reduce the time and effort required to fix bugs while improving software quality overall. Previous studies have shown that a large amount of bug reports are duplicates of previously reported ones. For example, as many as 30% of all reports in for Firefox are duplicates. Method: While there exist a wide variety of approaches to automatically detect duplicate bug reports by natural language processing, only a few approaches have considered execution information (the so-called stack traces) inside bug reports. In this paper, we propose a novel approach that automatically detects duplicate bug reports using stack traces and Hidden Markov Models. Results: When applying our approach to Firefox and GNOME datasets, we show that, for Firefox, the average recall for Rank k =1 is 59%, for Rank k=2 is 75.55%. We start reaching the 90% recall from k=10. The Mean Average Precision (MAP) value is up to 76.5%. For GNOME, The recall at k=1 is around 63%, while this value increases by about 10% for k=2. The recall increases to 97% for k=11. A MAP value of up to 73% is achieved.
Conclusion:We show that HMM and stack traces are a powerful combination for detecting and classifying duplicate bug reports in large bug repositories.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.