Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 2021
DOI: 10.18653/v1/2021.findings-acl.391
|View full text |Cite
|
Sign up to set email alerts
|

Disentangled Code Representation Learning for Multiple Programming Languages

Abstract: Developing effective distributed representations of source code is fundamental yet challenging for many software engineering tasks such as code clone detection, code search, code translation and transformation. However, current code embedding approaches that represent the semantic and syntax of code in a mixed way are less interpretable and the resulting embedding can not be easily generalized across programming languages. In this paper, we propose a disentangled code representation learning approach to separa… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 10 publications
(6 citation statements)
references
References 29 publications
0
6
0
Order By: Relevance
“…Instead, DYPRO [75] and LIGER [76] learn program representations through dynamic executions, from the mixture of symbolic and concrete execution traces. Zhang et al [77] addressed the problem of code representation using a multi-language setting to create an embedding that separates the semantic from the context of the source code. FLOW2VEC [78] is an embedding approach that preserves interprocedural program dependence by approximating the high-order proximity.…”
Section: Deep Learning For Code Intelligencementioning
confidence: 99%
“…Instead, DYPRO [75] and LIGER [76] learn program representations through dynamic executions, from the mixture of symbolic and concrete execution traces. Zhang et al [77] addressed the problem of code representation using a multi-language setting to create an embedding that separates the semantic from the context of the source code. FLOW2VEC [78] is an embedding approach that preserves interprocedural program dependence by approximating the high-order proximity.…”
Section: Deep Learning For Code Intelligencementioning
confidence: 99%
“…For this example, incorporating the caller/callee information may help boost the effectiveness. Beyond that, some more advanced static-analysis-based code embedding approaches (e.g., using data-flows and control-flows) [55], [65] would be beneficial to capture the code change semantics and thus deserve exploration in future studies.…”
Section: Future Directions For Comment Updatementioning
confidence: 99%
“…Wan et al 45 discussed the effectiveness of CodeBERT and GraphCodeBERT for preprocessing in code characterization, respectively. Zhang et al 46 separated the semantics from the syntax of the source code semantics from the syntax of the source code that obtains better interpretability and generality. Sui et al 47 precisely preserved the program dependencies (also known as value streams) between programs embedded in the control flow and alias-aware data flow of programs into a low-dimensional vector space by approximating higher-order proximity, and proposed the Flow2Vec model.…”
Section: Code Embeddingmentioning
confidence: 99%