2019
DOI: 10.1007/978-3-030-27455-9_2
|View full text |Cite
|
Sign up to set email alerts
|

A Systematic Comparison of Search Algorithms for Topic Modelling—A Study on Duplicate Bug Report Identification

Abstract: Latent Dirichlet Allocation (LDA) has been used to support many software engineering tasks. Previous studies showed that default settings lead to sub-optimal topic modeling with a dramatic impact on the performance of such approaches in terms of precision and recall. For this reason, researchers used search algorithms (e.g., genetic algorithms) to automatically configure topic models in an unsupervised fashion. While previous work showed the ability of individual search algorithms in finding near-optimal confi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
9
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
2
2
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 8 publications
(9 citation statements)
references
References 31 publications
(74 reference statements)
0
9
0
Order By: Relevance
“…Through manual analysis, we identified 530 (81.11%) commits discussing one or more performancerelated issues. We further expanded the keywords by using textual analysis methods (De Lucia et al, 2014) and topic modeling (Panichella et al, 2013;Panichella, 2019). This resulted in 1640 additional commits, of which 163 commits (9.37%) contained one or more selfadmitted performance-related issues.…”
Section: Rq2: How Prevalent Are Cps-specific Performance Antipatterns...mentioning
confidence: 99%
See 1 more Smart Citation
“…Through manual analysis, we identified 530 (81.11%) commits discussing one or more performancerelated issues. We further expanded the keywords by using textual analysis methods (De Lucia et al, 2014) and topic modeling (Panichella et al, 2013;Panichella, 2019). This resulted in 1640 additional commits, of which 163 commits (9.37%) contained one or more selfadmitted performance-related issues.…”
Section: Rq2: How Prevalent Are Cps-specific Performance Antipatterns...mentioning
confidence: 99%
“…Then, we pre-process these artifacts by tokenizing the commit message, removing stop words, and stemming. First, tokenization aims to extract words in the text and remove nonrelevant characters, such as punctuation marks, special characters, and numbers (Panichella, 2019). As commit messages can contain code snippets, we split compound names (i.e., identifiers) into tokens using camel case and snake case splitting (Panichella et al, 2016).…”
Section: Abstractet Expansion With Information Retrieval and Topic Mo...mentioning
confidence: 99%
“…Reports. Many research projects have focused on detecting duplicate textual bug reports [28,29,35,36,52,54,55,64,70,72,74,82,84,86,87,89,90,[93][94][95][96][97][99][100][101]107]. Similar to Ta n g o , most of the proposed techniques return a ranked list of duplicate candidates [35,63].…”
Section: Detection Of Duplicate Textual Bugmentioning
confidence: 99%
“…This step includes stemming and removal of stop words. Then either the Termby-document matrix or the probabilistic models are generated which are then used to calculate the textual similarities [24]. Term-by-Document matrix includes vocabulary which is also referred to as Terms as rows and the documents as the columns.…”
Section: A Information-retrieval Basedmentioning
confidence: 99%
“…[35] used the LDA and LSI approaches to find how continuously querying the bug report like how it happens in Google Search Engine helps find the duplicate bug reports. The author in [24] in their paper compared five metaheuristics GA, DE, Particle Swarm Optimization, Simulated Annealing and Random Search, to analyze how the LDA works when applied.…”
Section: A Information-retrieval Basedmentioning
confidence: 99%