A Systematic Comparison of Search Algorithms for Topic Modelling—A Study on Duplicate Bug Report Identification

Panichella, Annibale

doi:10.1007/978-3-030-27455-9_2

Cited by 8 publications

(9 citation statements)

References 31 publications

(74 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Through manual analysis, we identified 530 (81.11%) commits discussing one or more performancerelated issues. We further expanded the keywords by using textual analysis methods (De Lucia et al, 2014) and topic modeling (Panichella et al, 2013;Panichella, 2019). This resulted in 1640 additional commits, of which 163 commits (9.37%) contained one or more selfadmitted performance-related issues.…”

Section: Rq2: How Prevalent Are Cps-specific Performance Antipatterns...mentioning

confidence: 99%

“…Then, we pre-process these artifacts by tokenizing the commit message, removing stop words, and stemming. First, tokenization aims to extract words in the text and remove nonrelevant characters, such as punctuation marks, special characters, and numbers (Panichella, 2019). As commit messages can contain code snippets, we split compound names (i.e., identifiers) into tokens using camel case and snake case splitting (Panichella et al, 2016).…”

Section: Abstractet Expansion With Information Retrieval and Topic Mo...mentioning

confidence: 99%

See 1 more Smart Citation

The Slow and the Furious? Performance Antipattern Detection in Cyber-Physical Systems

Dinten

Derakhshanfar

Panichella

et al. 2023

Preprint

View full text Add to dashboard Cite

Section: Rq2: How Prevalent Are Cps-specific Performance Antipatterns...mentioning

confidence: 99%

Section: Abstractet Expansion With Information Retrieval and Topic Mo...mentioning

confidence: 99%

The Slow and the Furious? Performance Antipattern Detection in Cyber-Physical Systems

Dinten

Derakhshanfar

Panichella

et al. 2023

Preprint

View full text Add to dashboard Cite

“…Reports. Many research projects have focused on detecting duplicate textual bug reports [28,29,35,36,52,54,55,64,70,72,74,82,84,86,87,89,90,[93][94][95][96][97][99][100][101]107]. Similar to Ta n g o , most of the proposed techniques return a ranked list of duplicate candidates [35,63].…”

Section: Detection Of Duplicate Textual Bugmentioning

confidence: 99%

It Takes Two to Tango: Combining Visual and Textual Information for Detecting Duplicate Video-Based Bug Reports

Cooper

Bernal-Cárdenas

Chaparro

et al. 2021

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)

View full text Add to dashboard Cite

When a bug manifests in a user-facing application, it is likely to be exposed through the graphical user interface (GUI). Given the importance of visual information to the process of identifying and understanding such bugs, users are increasingly making use of screenshots and screen-recordings as a means to report issues to developers. However, when such information is reported en masse, such as during crowd-sourced testing, managing these artifacts can be a time-consuming process. As the reporting of screen-recordings in particular becomes more popular, developers are likely to face challenges related to manually identifying videos that depict duplicate bugs. Due to their graphical nature, screen-recordings present challenges for automated analysis that preclude the use of current duplicate bug report detection techniques. To overcome these challenges and aid developers in this task, this paper presents TA N G O , a duplicate detection technique that operates purely on video-based bug reports by leveraging both visual and textual information. TANGO combines tailored computer vision techniques, optical character recognition, and text retrieval. We evaluated multiple configurations of Ta n g o in a comprehensive empirical evaluation on 4,860 duplicate detection tasks that involved a total of 180 screenrecordings from six Android apps. Additionally, we conducted a user study investigating the effort required for developers to manually detect duplicate video-based bug reports and compared this to the effort required to use TA N G O . The results reveal that TA N G O 's optimal configuration is highly effective at detecting duplicate video-based bug reports, accurately ranking target duplicate videos in the top-2 returned results in 83% of the tasks. Additionally, our user study shows that, on average, TANGO can reduce developer effort by over 60%, illustrating its practicality.

show abstract

“…This step includes stemming and removal of stop words. Then either the Termby-document matrix or the probabilistic models are generated which are then used to calculate the textual similarities [24]. Term-by-Document matrix includes vocabulary which is also referred to as Terms as rows and the documents as the columns.…”

Section: A Information-retrieval Basedmentioning

confidence: 99%

“…[35] used the LDA and LSI approaches to find how continuously querying the bug report like how it happens in Google Search Engine helps find the duplicate bug reports. The author in [24] in their paper compared five metaheuristics GA, DE, Particle Swarm Optimization, Simulated Annealing and Random Search, to analyze how the LDA works when applied.…”

Section: A Information-retrieval Basedmentioning

confidence: 99%

A Systematic Study of Duplicate Bug Report Detection

Gupta¹,

Gupta²

2021

IJACSA

View full text Add to dashboard Cite

Defects are an integral part of any software project. They can arise at any time, at any phase of the software development or the maintenance phase. In open source projects, open bug repositories are used to maintain the bug reports. When a new bug report arrives, a person called "Triager" analyzes the bug report and assign it to some responsible developer. But before assigning, has to look if it is duplicate or not. Duplicate Bug Report is one of the big problems in the maintenance of bug repositories. Lack of knowledge and vocabulary skills of reporters sometimes increases the effort required for this purpose. Bug Tracking Systems are usually used to maintain the bug reports and are the most consulted resource during the maintenance process. Because of the Uncoordinated nature of the submission of bug reports to the tracking system, many times the same bug report is reported by many users. Duplicate Bug Reports lead to the waste of resources and the economy. It creates problems for triagers and requires a lot of analysis and validation. Lot of work has been done in the field of duplicate bug report detection. In this paper, we present the researches systematically done in this field by classifying the works into three categories and listing down the methods being used for the classified researches. The paper considers the papers till January 2020 for the analysis purpose. The paper mentions the strengths, limitations, data set, and the major approach used by the popular papers of the research in this field. The paper also lists the challenges and future directions in this field of research.

show abstract

A Systematic Comparison of Search Algorithms for Topic Modelling—A Study on Duplicate Bug Report Identification

Cited by 8 publications

References 31 publications

The Slow and the Furious? Performance Antipattern Detection in Cyber-Physical Systems

The Slow and the Furious? Performance Antipattern Detection in Cyber-Physical Systems

It Takes Two to Tango: Combining Visual and Textual Information for Detecting Duplicate Video-Based Bug Reports

A Systematic Study of Duplicate Bug Report Detection

Contact Info

Product

Resources

About