2007
DOI: 10.1016/j.infsof.2006.10.017
|View full text |Cite
|
Sign up to set email alerts
|

Semantic clustering: Identifying topics in source code

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
254
0
2

Year Published

2009
2009
2019
2019

Publication Types

Select...
8
1

Relationship

0
9

Authors

Journals

citations
Cited by 391 publications
(257 citation statements)
references
References 36 publications
1
254
0
2
Order By: Relevance
“…Therefore, a code search engine can facilitate searching in specific topics, or support filtering results by topics. Results from topic modeling on source code show that it is possible to extract meaningful topics from source code automatically (Baldi et al 2008;Kuhn et al 2007). Code search engines should leverage such information.…”
Section: Topic Modelingmentioning
confidence: 99%
“…Therefore, a code search engine can facilitate searching in specific topics, or support filtering results by topics. Results from topic modeling on source code show that it is possible to extract meaningful topics from source code automatically (Baldi et al 2008;Kuhn et al 2007). Code search engines should leverage such information.…”
Section: Topic Modelingmentioning
confidence: 99%
“…As expected, the equations suggest too few clusters for small systems, but are otherwise remarkably robust across a range of system sizes, and for systems written in different languages. This equation can be compared to an estimate given by Kuhn et al in [10]. For an m × n document-term matrix, where m is the number of documents (classes, instead of methods or functions) and n is the total number of terms over all documents, the authors suggest using a value of (m × n) 0.2 .…”
Section: Resultsmentioning
confidence: 99%
“…Many authors propose somewhere in the range of 200 to 300 topics [7,8], and a recent study showed "islands of stability" around 300 to 500 topics for document sets in the millions, with performance degrading outside of that range [9]. Kuhn et al suggest using smaller topic values, noting that a smaller number of topics may be warranted for analyzing software corpora because the document count is smaller than typical natural-language corpora [10]. However, source code documents are classes in their research, whereas for us documents are methods or functions.…”
Section: Introductionmentioning
confidence: 99%
“…Poshyvanyk and Marcus (2006) Fluri et al use a set-based similarity metric to explore how comments and code evolve over time (Fluri et al 2007). Kuhn et al (2007) proposed the use of IR techniques to exploit linguistic information found in source code, such as identifiers (i.e., class or method) names and comments. Revelle et al (2011) define new feature coupling metrics based on structural and textual source code information.…”
Section: Related Workmentioning
confidence: 99%