2020
DOI: 10.1145/3418463
|View full text |Cite
|
Sign up to set email alerts
|

IR2V EC

Abstract: We propose IR2V EC , a Concise and Scalable encoding infrastructure to represent programs as a distributed embedding in continuous space. This distributed embedding is obtained by combining representation learning methods with flow information to capture the syntax as well as the semantics of the input programs. As our infrastructure is based on the Intermediate Representation (IR) of the source code, obtained embeddings are both language and machine independent. The entities of the IR … Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
7
2
1

Relationship

0
10

Authors

Journals

citations
Cited by 45 publications
(7 citation statements)
references
References 41 publications
0
7
0
Order By: Relevance
“…Machine Learning models. We also note that more or less computation-intensive ML models are used for prediction, ranging from simple decision trees or support vector machine [39], [48], [49], to complex deep learning methods [41], [45]. Multiple models can also be considered as illustrated by Roy et al [38].…”
Section: Discussionmentioning
confidence: 99%
“…Machine Learning models. We also note that more or less computation-intensive ML models are used for prediction, ranging from simple decision trees or support vector machine [39], [48], [49], to complex deep learning methods [41], [45]. Multiple models can also be considered as illustrated by Roy et al [38].…”
Section: Discussionmentioning
confidence: 99%
“…They can match or even surpass advanced methods using only simple LSTM [21] and pre-trained embeddings. Venkatakeerthy et al [22] provided IR2Vec, a concise and scalable encoding infrastructure to represent programs as distributed embeddings in continuous space. This method takes symbolic and flow-aware embeddings to construct LLVM Entities and map them to real-valued distributed embeddings.…”
Section: Intermediate Representationmentioning
confidence: 99%
“…One solution is representing code in different languages with a uniform compiler-generated intermediate representation (IR). The model could be trained on IR (Ben-Nun, Jakobovits, and Hoefler 2018;VenkataKeerthy et al 2020) rather than on the source code, allowing it to learn common patterns across different languages. However, obtaining IR for different programming languages requires intensive domain expertise and engineering efforts to fix compilation errors, making it infeasible for language extension.…”
Section: Introductionmentioning
confidence: 99%