2020
DOI: 10.1111/cogs.12921
|View full text |Cite
|
Sign up to set email alerts
|

Do Programmers Prefer Predictable Expressions in Code?

Abstract: Source code is a form of human communication, albeit one where the information shared between the programmers reading and writing the code is constrained by the requirement that the code executes correctly. Programming languages are more syntactically constrained than natural languages, but they are also very expressive, allowing a great many different ways to express even very simple computations. Still, code written by developers is highly predictable, and many programming tools have taken advantage of this … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2

Citation Types

0
5
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(7 citation statements)
references
References 94 publications
(139 reference statements)
0
5
0
Order By: Relevance
“…During pre-training, Corder takes AST-based intermediate representation as input and ignores learning the source code text directly. Such ignorance could make the model less capable of understanding the rich semantics underneath the source code text, such as variable/function names and comments, which are the main resources to expose developers' intentions during coding [8][9][10], and consequently, degrade the quality of learned code representations. ContraCode.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…During pre-training, Corder takes AST-based intermediate representation as input and ignores learning the source code text directly. Such ignorance could make the model less capable of understanding the rich semantics underneath the source code text, such as variable/function names and comments, which are the main resources to expose developers' intentions during coding [8][9][10], and consequently, degrade the quality of learned code representations. ContraCode.…”
Section: Discussionmentioning
confidence: 99%
“…While the optimized and obfuscated code provides precise and formal semantics [8], they tend to be unnatural, introducing data structures and variable names that are not commonly used in human-written programs. Existing studies have argued that such formal but unnatural programs are less favorable to human developers [9,10] and obstruct the code models' learning [11]. Also, ContraCode does not generate semantically contradicting programs as hard negative samples.…”
Section: Discussionmentioning
confidence: 99%
“…The formal channel, unique to code, affords precise, formal semantics; interpreters, compilers, etc., use this channel. On the other hand, the natural channel (perhaps more probabilistic and noisy) relies on variable names, comments, etc., and is commonly used by humans for code comprehension and communication [15,16]. The formal channel's precision enables semantic preserving code transformation, which supports static analysis, optimization, obfuscation, etc.…”
Section: The Dual Channels Of Codementioning
confidence: 99%
“…However, not all the semantically equivalent code is "natural" [33]-the usual way developers write code and thus, amenable to statistical models [33]. In fact, deviation from such "naturalness" may lead to unintended bugs [53], and increase difficulty of human comprehension [15,16].…”
Section: The Dual Channels Of Codementioning
confidence: 99%
See 1 more Smart Citation