Proceedings of the 22nd International Conference on Program Comprehension 2014
DOI: 10.1145/2597008.2597150
|View full text |Cite
|
Sign up to set email alerts
|

Understanding LDA in source code analysis

Abstract: Latent Dirichlet Allocation (LDA) has seen increasing use in the understanding of source code and its related artifacts in part because of its impressive modeling power. However, this expressive power comes at a cost: the technique includes several tuning parameters whose impact on the resulting LDA model must be carefully considered. An obvious example is the burn-in period; too short a burn-in period leaves excessive echoes of the initial uniform distribution. The aim of this work is to provide insights into… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
36
0

Year Published

2014
2014
2018
2018

Publication Types

Select...
5
2
2

Relationship

0
9

Authors

Journals

citations
Cited by 44 publications
(36 citation statements)
references
References 12 publications
0
36
0
Order By: Relevance
“…In this article, to effectively use LDA, we apply it in a package-level corpus rather than each class to extract the latent topics to simulate the functional features or concerns for a package since small (class-level) corpus is too small to generate good topics [19][20][21][22][23]. Then, we cluster the classes according to these topics and assign different classes to their corresponding topics [23].…”
Section: Latent Dirichlet Allocationmentioning
confidence: 99%
See 1 more Smart Citation
“…In this article, to effectively use LDA, we apply it in a package-level corpus rather than each class to extract the latent topics to simulate the functional features or concerns for a package since small (class-level) corpus is too small to generate good topics [19][20][21][22][23]. Then, we cluster the classes according to these topics and assign different classes to their corresponding topics [23].…”
Section: Latent Dirichlet Allocationmentioning
confidence: 99%
“…Program comprehension is one of the most important activities in software maintenance and reverse engineering [8,10,23,44,45]. Clustering techniques are commonly used to decompose a software system into small units for easier comprehension.…”
Section: Related Workmentioning
confidence: 99%
“…There are a number of studies focusing on this area [10,[34][35][36][37][38]. Program clustering is one of the effective ways for program comprehension.…”
Section: Related Workmentioning
confidence: 99%
“…The collapsed Gibbs sampler also has to be configured with a number of parameters including the number of burn-in iterations, the number of samples, and the sampling interval. These parameters have a significant impact on the resulting topic model in LDA and are subject of studies [11], [12] to help researchers configure LDA when used in the context of source code analysis. Interactive topic modeling has another parameter η which can be used to control the strength of domain knowledge on the inferred topic models, allowing for overriding the user-specified constraints if the underlying data strongly suggests otherwise.…”
Section: B Interactive Topic Modelingmentioning
confidence: 99%