2012 34th International Conference on Software Engineering (ICSE) 2012
DOI: 10.1109/icse.2012.6227135
|View full text |Cite
|
Sign up to set email alerts
|

On the naturalness of software

Abstract: Natural languages like English are rich, complex, and powerful. The highly creative and graceful use of languages like English and Tamil, by masters like Shakespeare and Avvaiyar, can certainly delight and inspire. But in practice, given cognitive constraints and the exigencies of daily life, most human utterances are far simpler and much more repetitive and predictable. In fact, these utterances can be very usefully modeled using modern statistical methods. This fact has led to the phenomenal success of stati… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

15
641
0
3

Year Published

2013
2013
2024
2024

Publication Types

Select...
5
2
1

Relationship

1
7

Authors

Journals

citations
Cited by 592 publications
(659 citation statements)
references
References 36 publications
15
641
0
3
Order By: Relevance
“…In those areas, they are used for ranking candidate sentences, such as candidate translations of a foreign language sentence, based on how natural they are in the target language. To our knowledge, Hindle et al [1] were the first to apply language models to source code.…”
Section: Language Models For Programming Languagesmentioning
confidence: 99%
See 1 more Smart Citation
“…In those areas, they are used for ranking candidate sentences, such as candidate translations of a foreign language sentence, based on how natural they are in the target language. To our knowledge, Hindle et al [1] were the first to apply language models to source code.…”
Section: Language Models For Programming Languagesmentioning
confidence: 99%
“…Recently, Hindle et al [1] presented pioneering work in learning language models over source code, that represent broad statistical characteristics of coding style. Language models (LMs) are simply probability distributions over strings.…”
Section: Introductionmentioning
confidence: 99%
“…While there are many ways in which tokens in a block can be ordered e.g., alphabetical order, length of tokens, occurance frequency of token in a corpus, etc., a natural question is what order is most effective in this context. As it turns out, software vocabulary exhibits very similar characteristics to natural languages corpus and also follow Zipf's law [13,40]. That is, there are few very popular (frequent) tokens, and the frequency of tokens decreases very rapidly with rank.…”
Section: Sub-block Overlap Filteringmentioning
confidence: 99%
“…The mental model of the programmer may be something like a language model for speech, but rather applied to code. Language models are typically applied to natural human utterances but they have also been successfully applied to software (Hindle et al, 2012;Raychev et al, 2014;White et al, 2015), and can be used to discover unexpected segments of tokens in source code (Campbell et al, 2014).…”
Section: Introductionmentioning
confidence: 99%
“…Thus GrammarGuru uses language models to capture code regularity or naturalness and then looks for irregular code (Campbell et al, 2014). Once the location of a potential error is found, code completion techniques that exploit language models (Hindle et al, 2012;Raychev et al, 2014;White et al, 2015) can be used to suggest possible fixes. Traditional parsers do not rely upon such information.…”
Section: Introductionmentioning
confidence: 99%