Automatic Duplicate Bug Report Detection using Information Retrieval-based versus Machine Learning-based Approaches

Neysiani, Behzad Soleimani; Babamir, Seyed Morteza

doi:10.1109/icwr49608.2020.9122288

Cited by 16 publications

(6 citation statements)

References 48 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In [51], Neysiani and Babamir proposed a study aimed at assessing the best DBR detection (or retrieval) approaches. They analyzed both IR-based and Machine Learning (ML) approaches.…”

Section: A Mini-systematic Survey About Dbr Detection and Retrievalmentioning

confidence: 99%

BushraDBR: An Automatic Approach to Retrieving Duplicate Bug Reports

Al-Msie’deen

2024

IJCDS

View full text Add to dashboard Cite

A Bug Tracking System (BTS), such as Bugzilla, is generally utilized to track submitted Bug Reports (BRs) for a particular software system. Duplicate Bug Report (DBR) retrieval is the process of obtaining a DBR in the BTS. This process is important to avoid needless work from engineers on DBRs. To prevent wasting engineer resources, such as effort and time, on previously submitted (or duplicate) BRs, it is essential to find and retrieve DBRs as soon as they are submitted by software users. Thus, this paper proposes an automatic approach (called BushraDBR) that aims to assist an engineer (called a triager) to retrieve DBRs and stop the duplicates before they start. Where BushraDBR stands for Bushra Duplicate Bug Reports retrieval process. Therefore, when a new BR is sent to the Bug Repository (BRE), an engineer checks whether it is a duplicate of an existing BR in BRE or not via BushraDBR approach. If it is, the engineer marks it as DBR, and the BR is excluded from consideration for any additional work; otherwise, the BR is added to the BRE. BushraDBR approach relies on Textual Similarity (TS) between the newly submitted BR and the rest of the BRs in BRE to retrieve DBRs. BushraDBR exploits unstructured data from BRs to apply Information Retrieval (IR) methods in an efficient way. BushraDBR approach uses two techniques to retrieve DBRs: Latent Semantic Indexing (LSI) and Formal Concept Analysis (FCA). The originality of BushraDBR is to stop DBRs before they occur by comparing the newly reported BR with the rest of the BRs in the BTS, thus saving time and effort during the Software Maintenance (SM) process. BushraDBR also uniquely retrieves DBR through the use of LSI and FCA techniques. BushraDBR approach had been validated and evaluated on several publicly available data sets from Bugzilla. Experiments show the ability of BushraDBR approach to retrieve DBRs in an efficient and accurate manner.

show abstract

“…In [51], Neysiani and Babamir proposed a study aimed at assessing the best DBR detection (or retrieval) approaches. They analyzed both IR-based and Machine Learning (ML) approaches.…”

Section: A Mini-systematic Survey About Dbr Detection and Retrievalmentioning

confidence: 99%

BushraDBR: An Automatic Approach to Retrieving Duplicate Bug Reports

Al-Msie’deen

2024

IJCDS

View full text Add to dashboard Cite

show abstract

“…Neysiani et al compared IR-based and ML-based methods for bug report deduplication [12], and the experimental results showed no significant difference in terms of accuracy or runtime efficiency. Campbell et al conducted a quantitative analysis of commonly used bug classification methods, including signature-based approaches (such as functions, addresses, and linked libraries) and text-tokenized methods.…”

Section: Related Surveymentioning

confidence: 99%

“…For example, for bugs with priority p1, the accuracy, precision, recall, and F measure were 0.732, 0.871, 0.732, and 0.796, respectively. Neysiani et al proposed a feature extraction model to aid in bug triage deduplication [12]. The model aggregates various features extracted from bug reports, including multiple text features extracted using TF-IDF, time features, context features, and classification features.…”

Section: Information Retrieval Approaches For Deduplication and Triagementioning

confidence: 99%

A Survey on Bug Deduplication and Triage Methods from Multiple Points of View

Qian,

Zhang,

Nie

et al. 2023

Applied Sciences

View full text Add to dashboard Cite

To address the issue of insufficient testing caused by the continuous reduction of software development cycles, many organizations maintain bug repositories and bug tracking systems to ensure real-time updates of bugs. However, each day, a large number of bugs is discovered and sent to the repository, which imposes a heavy workload on bug fixers. Therefore, effective bug deduplication and triage are of great significance in software development. This paper provides a comprehensive investigation and survey of the recent developments in bug deduplication and triage. The study begins by outlining the roadmap of the existing literature, including the research trends, mathematical models, methods, and commonly used datasets in recent years. Subsequently, the paper summarizes the general process of the methods from two perspectives—runtime information-based and bug report-based perspectives—and provides a detailed overview of the methodologies employed in relevant works. Finally, this paper presents a detailed comparison of the experimental results of various works in terms of usage methods, datasets, accuracy, recall rate, and F1 score. Drawing on key findings, such as the need to improve the accuracy of runtime information collection and refine the description information in bug reports, we propose several potential future research directions in the field, such as stack trace enrichment and the combination of new NLP models.

show abstract

“…For the above reasons, researches proposed various techniques based on text mining (Chaparro, 2017;Zhang, Chen, Yang, Lee and Luo, 2016) and machine learning (ML) (Zhang, Wang, Hao, Xie, Zhang and Mei, 2015;Tan, Xu, Wang, Zhang, Xu and Luo, 2020) to automate bug report processing. The widely employed ML techniques include Naïve Bayes (NB) (Lamkanfi, Demeyer, Giger and Goethals, 2010;Abdelmoez, Kholief and Elsalmy, 2012), Random Forest (RF) and Support Vector Machine (SVM) (Neysiani and Babamir, 2020), and k-nearest neighbors (K-NN) (Hamdy and El-Laithy, 2019). However, the performance of the ML techniques is not satisfactory (Ramay, Umer, Yin, Zhu and Illahi, 2019).…”

Section: Introductionmentioning

confidence: 99%

Deep Bug Reports Processing (DBRP): A Systematic Literature Review

Ahmad

Kholief

et al. 2023

Preprint

View full text Add to dashboard Cite

Many software projects utilize Bug Tracking System (BTS) to manage and process bug reports. Over the years, the number of bug report submissions has increased exponentially with some projects receiving as many as about a hundred submissions daily. Bug report processing (BRP) consists of five key processes: duplicate detection, severity prediction, fix-time prediction, bug triage, and bug localization. Previously, traditional machine learning (ML) algorithms -based models were proposed to automate BRP tasks. However, in recent years deep BRP (DBRP) models were proposed to exploit the ever-increasing bug repositories for automatic extraction of semantic and contextual features of bug reports. Although, some papers reviewed related literature from many perspectives of software maintenance, no existing work comprehensively reviewed the state of DBRP. In this paper, we review the state of DBRP models in the five key BRP tasks. For this, we collect papers from four international databases published between 2015 and 2021. We evaluate the papers with regard to the deep neural networks, text representation models, and bug report features utilized. Finally, we present findings and prospects for future research works in DBRP. Our review analyzes how word embedding, CNNs, LSTMs, attention mechanism are used to represent bug report and source code, model performance and challenges such as data imbalance, feature utilization and complexity.

show abstract

Automatic Duplicate Bug Report Detection using Information Retrieval-based versus Machine Learning-based Approaches

Cited by 16 publications

References 48 publications

BushraDBR: An Automatic Approach to Retrieving Duplicate Bug Reports

BushraDBR: An Automatic Approach to Retrieving Duplicate Bug Reports

A Survey on Bug Deduplication and Triage Methods from Multiple Points of View

Deep Bug Reports Processing (DBRP): A Systematic Literature Review

Contact Info

Product

Resources

About