2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR) 2019
DOI: 10.1109/msr.2019.00078
|View full text |Cite
|
Sign up to set email alerts
|

Cross-Language Clone Detection by Learning Over Abstract Syntax Trees

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
38
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 44 publications
(38 citation statements)
references
References 19 publications
0
38
0
Order By: Relevance
“…4. AST-based cross-language clone detection was proposed by Perez (2019) [24]. The approach is a semi-supervised machine learning model which is capable of detecting cross-language clones by employing a token level vector generation algorithm and tree-based skip-gram algorithm.…”
Section: Related Workmentioning
confidence: 99%
“…4. AST-based cross-language clone detection was proposed by Perez (2019) [24]. The approach is a semi-supervised machine learning model which is capable of detecting cross-language clones by employing a token level vector generation algorithm and tree-based skip-gram algorithm.…”
Section: Related Workmentioning
confidence: 99%
“…Peng et al [14] proposed a novel "coding criterion" to build vector representations of nodes in ASTs, which have provided great progress in program analysis. BIGCODE [15] is a tool that can learn AST representations of given source codes with the help of the Skip-gram model [16].…”
Section: Program Vector Embeddingsmentioning
confidence: 99%
“…In this step, each node in ASTs is trained and map to a real-valued vector, which contains each feature of the node. Inspired by BIGCODE tools [15], the Skip-gram model [16] is used to compute node vectors. The principle of this model is to use the currently known nodes to predict the context of them.…”
Section: Program Vector Embeddingsmentioning
confidence: 99%
“…e main idea of these approaches is to convert the source code written in different languages into common tree structures, such as eCST (enriched concrete syntax tree) [5], AST [27,28], and CodeDOM (Code Document Object Model) [29]. en, the tree structures are converted into token sequences or vectors to improve the efficiency of similarity measure.…”
Section: Cross-language Source Code Similarity Detection Through Tree-based Intermediate Representationmentioning
confidence: 99%
“…ese approaches also ignore the structural features of the source code. Although the approach proposed in [28] combines the AST and LSTM to detect the similarity between Java and Python code, they are greatly affected by some complex obfuscation technologies, e.g., the commonly used adding redundant statements. Meanwhile, this kind of approach needs to train their models with a lot of code rather than detecting the code similarity directly.…”
Section: Cross-language Source Code Similarity Detectionmentioning
confidence: 99%