Toward mining "concept keywords" from identifiers in large software projects

Ohba, Masaru; Gondow, Katsuhiko

doi:10.1145/1082983.1083151

Cited by 12 publications

(7 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…High quality identifier names lie at the heart of software engineering [6,16,20,24,42,66,69,54]; they drive code readability and comprehension [12,19,20,41,44,68]. According to Deißenböck and Pizka [17], identifiers represent the majority (70%) of source code tokens.…”

Section: Related Workmentioning

confidence: 99%

Learning natural coding conventions

Allamanis

Barr

Bird

et al. 2014

Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering

326

271

View full text Add to dashboard Cite

Every programmer has a characteristic style, ranging from preferences about identifier naming to preferences about object relationships and design patterns. Coding conventions define a consistent syntactic style, fostering readability and hence maintainability. When collaborating, programmers strive to obey a project's coding conventions. However, one third of reviews of changes contain feedback about coding conventions, indicating that programmers do not always follow them and that project members care deeply about adherence. Unfortunately, programmers are often unaware of coding conventions because inferring them requires a global view, one that aggregates the many local decisions programmers make and identifies emergent consensus on style. We present NATURAL-IZE, a framework that learns the style of a codebase, and suggests revisions to improve stylistic consistency. NATURALIZE builds on recent work in applying statistical natural language processing to source code. We apply NATURALIZE to suggest natural identifier names and formatting conventions. We present four tools focused on ensuring natural code during development and release management, including code review. NATURALIZE achieves 94% accuracy in its top suggestions for identifier names and can even transfer knowledge about conventions across projects, leveraging a corpus of 10,968 open source projects. We used NATURALIZE to generate 18 patches for 5 open source projects: 14 were accepted.

show abstract

Section: Related Workmentioning

confidence: 99%

Learning natural coding conventions

Allamanis

Barr

Bird

et al. 2014

Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering

326

271

View full text Add to dashboard Cite

show abstract

“…• Observation 2: The information contained in the text of change logs and release notes of a software product is represented with some keywords. In the two types of observations, Observation 2 has been widely accepted in the text mining community, i.e., keyword mining has become a standard text mining technique [10]; Observation 1 is supported by the evidences reported by previous studies. For example, in [6], Baysal and Malton found that the non-source code documents contain similar amount of contents of source code changes in software maintenance and evolution, which indicates that non-source code documents, such as email archives, release notes, and change logs, might accurately record the maintenance and evolution activity of a software product.…”

Section: Mapping Activities To Abstractmentioning

confidence: 82%

Mining Change Logs and Release Notes to Understand Software Maintenance and Evolution

Yu¹

2009

CLEIej

View full text Add to dashboard Cite

Software change logs and release notes are documents released together with new versions of a software product. They contain the description of the changes made to the previous version and the new features introduced in the new version. In this paper, we present a keywordbased approach to mining and analyzing non-source code documents and define a mathematical framework to represent the data. This approach is applied in the study of the change logs of Linux and the release notes of FreeBSD. The results show that the software maintenance process and evolution process share some common properties and the keyword-based text mining technique could be used as a systematic method to study software maintenance and evolution.

show abstract

“…Contextual query reformulation relies on SWUM's phrasal concepts to extract phrases from source code because existing techniques for extracting phrases did not meet the needs of the concern location problem. There is work on automatically extracting topic words and phrases from source code [67,71], displaying search results in a concept lattice of keywords [72], and clustering program elements that share similar phrases [46]. Although useful for exploring the overall word usage of an unfamiliar software system, these techniques are not sufficient for exploring all usage.…”

Section: Contextual Query Reformulationmentioning

confidence: 99%

“…Although useful for exploring the overall word usage of an unfamiliar software system, these techniques are not sufficient for exploring all usage. In contrast to the contextual approach, these approaches either filter the topics based on perceived importance to the system [46,71,72], or do not produce human understandable topic labels [67]. Since it is impossible to predict a priori what will be of interest to the developer, the contextual approach lets the developer filter the results with a natural language query, and uses human-readable extracted phrases.…”

Section: Contextual Query Reformulationmentioning

confidence: 99%

Natural Language-Based Software Analyses and Tools for Software Maintenance

Pollock

Vijay‐Shanker

Hill

et al. 2013

Software Engineering

View full text Add to dashboard Cite

Significant portions of software life cycle resources are devoted to program maintenance, which motivates the development of automated techniques and tools to support the tedious, error-prone tasks. Natural language clues from programmers' naming in literals, identifiers, and comments can be leveraged to improve the effectiveness of many software tools. For example, they can be used to increase the accuracy of software search tools, improve the ability of program navigation tools to recommend related methods, and raise the accuracy of other program analyses by providing access to natural language information. This chapter focuses on how to capture, model, and apply the programmers' conceptual knowledge expressed in both linguistic information as well as programming language structure and semantics. We call this kind of analysis Natural Language Program Analysis (NLPA) since it combines natural language processing techniques with program analysis to extract information for analysis of the source program.

show abstract

Toward mining "concept keywords" from identifiers in large software projects

Cited by 12 publications

References 12 publications

Learning natural coding conventions

Learning natural coding conventions

Mining Change Logs and Release Notes to Understand Software Maintenance and Evolution

Natural Language-Based Software Analyses and Tools for Software Maintenance

Contact Info

Product

Resources

About