2005
DOI: 10.1145/1082983.1083151
|View full text |Cite
|
Sign up to set email alerts
|

Toward mining "concept keywords" from identifiers in large software projects

Abstract: We propose the Concept Keyword Term Frequency/Inverse Document Frequency (ckTF/IDF) method as a novel technique to efficiency mine concept keywords from identifiers in large software projects. ckTF/IDF is suitable for mining concept keywords, since the ckTF/IDF is more lightweight than the TF/IDF method, and the ckTF/IDF's heuristics is tuned for identifiers in programs.We then experimentally apply the ckTF/IDF to our educational operating system udos, consisting of around 5,000 lines in C code, which produced… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Year Published

2009
2009
2023
2023

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 12 publications
(7 citation statements)
references
References 12 publications
0
7
0
Order By: Relevance
“…High quality identifier names lie at the heart of software engineering [6,16,20,24,42,66,69,54]; they drive code readability and comprehension [12,19,20,41,44,68]. According to Deißenböck and Pizka [17], identifiers represent the majority (70%) of source code tokens.…”
Section: Related Workmentioning
confidence: 99%
“…High quality identifier names lie at the heart of software engineering [6,16,20,24,42,66,69,54]; they drive code readability and comprehension [12,19,20,41,44,68]. According to Deißenböck and Pizka [17], identifiers represent the majority (70%) of source code tokens.…”
Section: Related Workmentioning
confidence: 99%
“…• Observation 2: The information contained in the text of change logs and release notes of a software product is represented with some keywords. In the two types of observations, Observation 2 has been widely accepted in the text mining community, i.e., keyword mining has become a standard text mining technique [10]; Observation 1 is supported by the evidences reported by previous studies. For example, in [6], Baysal and Malton found that the non-source code documents contain similar amount of contents of source code changes in software maintenance and evolution, which indicates that non-source code documents, such as email archives, release notes, and change logs, might accurately record the maintenance and evolution activity of a software product.…”
Section: Mapping Activities To Abstractmentioning
confidence: 82%
“…Contextual query reformulation relies on SWUM's phrasal concepts to extract phrases from source code because existing techniques for extracting phrases did not meet the needs of the concern location problem. There is work on automatically extracting topic words and phrases from source code [67,71], displaying search results in a concept lattice of keywords [72], and clustering program elements that share similar phrases [46]. Although useful for exploring the overall word usage of an unfamiliar software system, these techniques are not sufficient for exploring all usage.…”
Section: Contextual Query Reformulationmentioning
confidence: 99%
“…Although useful for exploring the overall word usage of an unfamiliar software system, these techniques are not sufficient for exploring all usage. In contrast to the contextual approach, these approaches either filter the topics based on perceived importance to the system [46,71,72], or do not produce human understandable topic labels [67]. Since it is impossible to predict a priori what will be of interest to the developer, the contextual approach lets the developer filter the results with a natural language query, and uses human-readable extracted phrases.…”
Section: Contextual Query Reformulationmentioning
confidence: 99%