Multilingual training for software engineering

Ahmed, Toufique; Dévanbu, Prémkumar

doi:10.1145/3510003.3510049

Cited by 33 publications

(26 citation statements)

References 32 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Chen et al [86] investigated the proposal by Ahmed and Devanbu [87] to pre-train DL models on multiple programming languages. The authors reported that multilingual models have worst performance as compared to monolingual ones.…”

Section: Related Workmentioning

confidence: 99%

Automating Code-Related Tasks Through Transformers: The Impact of Pre-training

Tufano¹,

Pascarella²,

Bavota³

2023

Preprint

View full text Add to dashboard Cite

Transformers have gained popularity in the software engineering (SE) literature. These deep learning models are usually pre-trained through a self-supervised objective, meant to provide the model with basic knowledge about a language of interest (e.g., Java). A classic pre-training objective is the masked language model (MLM), in which a percentage of tokens from the input (e.g., a Java method) is masked, with the model in charge of predicting them. Once pre-trained, the model is then finetuned to support the specific downstream task of interest (e.g., code summarization). While there is evidence suggesting the boost in performance provided by pre-training, little is known about the impact of the specific pre-training objective(s) used. Indeed, MLM is just one of the possible pre-training objectives and recent work from the natural language processing field suggest that pre-training objectives tailored for the specific downstream task of interest may substantially boost the model's performance. For example, in the case of code summarization, a tailored pretraining objective could be the identification of an appropriate name for a given method, considering the method name to generate as an extreme summary. In this study, we focus on the impact of pre-training objectives on the performance of transformers when automating code-related tasks. We start with a systematic literature review aimed at identifying the pre-training objectives used in SE. Then, we pre-train 32 transformers using both (i) generic pre-training objectives usually adopted in SE; and (ii) pre-training objectives tailored to specific code-related tasks subject of our experimentation, namely bug-fixing, code summarization, and code completion. We also compare the pretrained models with non pre-trained ones and show the advantage brought by pre-training in different scenarios, in which more or less fine-tuning data are available. Our results show that: (i) pre-training helps in boosting performance only if the amount of fine-tuning data available is small; (ii) the MLM objective is usually sufficient to maximize the prediction performance of the model, even when comparing it with pre-training objectives specialized for the downstream task at hand.

show abstract

Section: Related Workmentioning

confidence: 99%

Automating Code-Related Tasks Through Transformers: The Impact of Pre-training

Tufano¹,

Pascarella²,

Bavota³

2023

Preprint

View full text Add to dashboard Cite

show abstract

“…Data Duplication. Prior studies [4,39] found that duplicated data across training and testing could lead to unrealistic model performance. However, the prior studies only focus on code completion and code summarization tasks.…”

Section: (Rq2) How Reliable Are Automated Code Generation Approaches?mentioning

confidence: 99%

On the Reliability and Explainability of Automated Code Generation Approaches

Liu¹,

Tantithamthavorn²,

Liu³

et al. 2023

Preprint

View full text Add to dashboard Cite

Automatic code generation, the task of generating new code snippets from existing code or comments, has long been of interest. Numerous code generation models have been proposed and proven on different benchmark datasets. However, little is known about whether this objective has been achieved and why code generation models effectively transform code sequences automatically. In other words, can we totally trust these automated code generation models? Consequently, there is a pressing need to understand the inner logic of code generation models and to investigate their replicability, reliability, and explainability. To bridge these research gaps, we conduct a thorough empirical study of five code generation models on four representative code generation datasets to assess the limits and capabilities of automatic code generation approaches. We further employ advanced explainable AI approaches to highlight the tokens that significantly contribute to the code generation. Experiments demonstrate that we successfully replicate state-of-the-art code generation approaches. We discover that state-of-the-art approaches suffer from severe data duplication and input insensitivity, which are subtle issues with significant implications. Our explainability analysis reveals that, in various experimental scenarios, code generation models can recognize code grammar and structural information, but can not capture key tokens that need to be updated. Our results draw several lessons and guidelines for future work in this area.CCS Concepts: • Software and its engineering → Software maintenance tools; • General and reference → Reliability; • Computing methodologies → Natural language processing.

show abstract

“…Code generation models have been applied to a variety of tasks, including test generation [19], docstring generation [20], code search [17,21], type inference [22,23,24], and more [25]. We focus on the natural-language-to-code task (NL2Code): given the description of a function in natural language, complete the function body.…”

Section: The Natural Language To Code Taskmentioning

confidence: 99%

“…Other tasks. Although we focus specifically on benchmarks for the code generation task, there are many other tasks that have been used to evaluate code generation models, including generating unit tests from code [19], code search [17,21], and type inference [22,23,24]. Lu et al [20] propose a suite of evaluation datasets for ten tasks, including code translation, docstring generation, and code summarization.…”

Section: Related Workmentioning

confidence: 99%

MultiPL-E: A Scalable and Polyglot Approach to Benchmarking Neural Code Generation

Federico

Gouwar

Nguyen

et al. 2023

IIEEE Trans. Software Eng.

View full text Add to dashboard Cite

Large language models have demonstrated the ability to generate both natural language and programming language text. Although contemporary code generation models are trained on corpora with several programming languages, they are tested using benchmarks that are typically monolingual. The most widely used code generation benchmarks only target Python, so there is little quantitative evidence of how code generation models perform on other programming languages. We propose MultiPL-E, a system for translating unit test-driven code generation benchmarks to new languages. We create the first massively multilingual code generation benchmark by using MultiPL-E to translate two popular Python code generation benchmarks to 18 additional programming languages.We use MultiPL-E to extend the HumanEval benchmark [1] and MBPP benchmark [2] to 18 languages that encompass a range of programming paradigms and popularity. Using these new parallel benchmarks, we evaluate the multi-language performance of three state-ofthe-art code generation models: Codex [1], CodeGen [3] and InCoder [4]. We find that Codex matches or even exceeds its performance on Python for several other languages. The range of programming languages represented in MultiPL-E allow us to explore the impact of language frequency and language features on model performance. Finally, the MultiPL-E approach of compiling code generation benchmarks to new programming languages is both scalable and extensible, making it straightforward to evaluate new models, benchmarks, and languages.4. These source-to-source compilers are sometimes called transpilers.

show abstract

Multilingual training for software engineering

Cited by 33 publications

References 32 publications

Automating Code-Related Tasks Through Transformers: The Impact of Pre-training

Automating Code-Related Tasks Through Transformers: The Impact of Pre-training

On the Reliability and Explainability of Automated Code Generation Approaches

MultiPL-E: A Scalable and Polyglot Approach to Benchmarking Neural Code Generation

Contact Info

Product

Resources

About