2021
DOI: 10.48550/arxiv.2110.06773
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Leveraging Automated Unit Tests for Unsupervised Code Translation

Abstract: With little to no parallel data available for programming languages, unsupervised methods are well-suited to source code translation. However, the majority of unsupervised machine translation approaches rely on back-translation, a method developed in the context of natural language translation and one that inherently involves training on noisy inputs. Unfortunately, source code is highly sensitive to small changes; a single token can result in compilation failures or erroneous programs, unlike natural language… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
4
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
2
2

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(6 citation statements)
references
References 31 publications
(44 reference statements)
0
4
0
Order By: Relevance
“…There are many intolerable errors in the result of machine translation, which leads to compilation failure. Roziere et al [34] proposed using automated unit tests to filter out faulty generated results. Therefore, a parallel corpus can be obtained for fine-tuning the unsupervised training translation model.…”
Section: Unsupervised Program Translationmentioning
confidence: 99%
“…There are many intolerable errors in the result of machine translation, which leads to compilation failure. Roziere et al [34] proposed using automated unit tests to filter out faulty generated results. Therefore, a parallel corpus can be obtained for fine-tuning the unsupervised training translation model.…”
Section: Unsupervised Program Translationmentioning
confidence: 99%
“…Their approach achieves outstanding effectiveness. Later on, they presented DOBF (Rozière et al 2021a) and TransCoder-ST (Rozière et al 2021b), the former pretrains a model to revert the code obfuscation function by training a sequence-tosequence model; the latter uses automatic test generation techniques to automatically select high-quality translation pairs to fine-tune the pre-trained model. These works use Computational Accuracy (CA), a measure to evaluate the translated code, which is based on the ratio of test cases that have similar outputs between the input program and its translation.…”
Section: Code Translationmentioning
confidence: 99%
“…For example, the neural code translation task employs machine translation technology to automatically translate Java language into C# language, reducing the neural code translation work for programmers [1][2] [3]. Some of the previous approaches concentrate on improving performance by refining code representations, such as the pre-trained model CodeBERT [6].…”
Section: Introductionmentioning
confidence: 99%
“…In recent years,we have witnessed a dramatic rise in applying deep source code processing models (a.k.a, code models) to source code processing tasks [1][2] [3][4] [5].…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation