2018
DOI: 10.48550/arxiv.1809.05193
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Context2Name: A Deep Learning-Based Approach to Infer Natural Variable Names from Usage Contexts

Abstract: Most of the JavaScript code deployed in the wild has been minified, a process in which identifier names are replaced with short, arbitrary and meaningless names. Minified code occupies less space, but also makes the code extremely difficult to manually inspect and understand. This paper presents Context2Name, a deep learning-based technique that partially reverses the effect of minification by predicting natural identifier names for minified names. The core idea is to predict from the usage context of a variab… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
22
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
7
1

Relationship

1
7

Authors

Journals

citations
Cited by 16 publications
(23 citation statements)
references
References 36 publications
0
22
0
Order By: Relevance
“…In a recent work, CONTEXT2NAME [4] attempted to assign meaningful names to the identifiers based on the context of minified JavaScript codes. They were able to successfully predict 47.5% of meaningful identifiers on 15,000 minified codes using recurrent neural networks.…”
Section: Neural Modelsmentioning
confidence: 99%
See 1 more Smart Citation
“…In a recent work, CONTEXT2NAME [4] attempted to assign meaningful names to the identifiers based on the context of minified JavaScript codes. They were able to successfully predict 47.5% of meaningful identifiers on 15,000 minified codes using recurrent neural networks.…”
Section: Neural Modelsmentioning
confidence: 99%
“…While researchers have taken steps in predicting variable names in high-level programming languages [1,3,4,26,41,45], it is worth noting that inferring variable names in decompiled binary code poses a unique set of challenges. High-level programming languages like Java, Python, and JavaScript are syntactically rich: Variable types are preserved in these languages, while they are usually eliminated in binaries.…”
Section: Introductionmentioning
confidence: 99%
“…Vasilescu et al [70] describe an approach to recover original names from minified JavaScript programs based on statistical machine translation (SMT). Bavishi et al [11] accomplish this using a deep learning-based technique. Jaffe et al [37] generate meaningful variable names for decompiled code by combining a translation model trained on a parallel corpus with a language model trained on unmodified C code.…”
Section: Related Workmentioning
confidence: 99%
“…We find that adding words found in strings and comments appears to have little impact on BPE 5K and 10K, both of which slightly increase the size of the corpus by 1-2%. A vocabulary of 10K words is more than 1,000 times smaller than the initial configuration (11,357,210), at the cost of increasing the number of tokens in the corpus by a factor of 1.7.…”
Section: Byte-pair Encodingmentioning
confidence: 99%
“…However, we are the first to focus on name-value inconsistencies, whereas prior work targets other kinds of problems. Nalin also relates to learned models that predict missing identifier names [12,17,48]. Our work differs by analyzing code with names supposed to be meaningful, instead of targeting obfuscated or compiled code.…”
Section: Introductionmentioning
confidence: 99%