Lyra: A Benchmark for Turducken-Style Code Generation

Liu, Shuaiqi; Cao, Jiannong; Yang, Ruosong; Wen, Zhiyuan

doi:10.24963/ijcai.2022/588

Cited by 3 publications

(9 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Turducken-style code, which was first proposed by (Liang et al 2021), refers to a style of code where declarative programming is embedded within imperative programming. This type of code can be commonly found in real-world software systems.…”

Section: Turducken-style Codementioning

confidence: 99%

“…The Lyra dataset (Liang et al 2021) which is for mapping functional descriptions to Turduckenstyle code, only considers Python code with embedded SQL. To improve the diversity of Turducken-style code generation, we used a crowd-sourcing approach to translate the Python code in Lyra into its corresponding Java code and modified the functional description.…”

Section: Datasetsmentioning

confidence: 99%

“…However, large software systems are rarely developed exclusively in a declarative language in practical software development. Instead, declarative programs are commonly embedded within imperative programs, which are normally referred to as Turducken-style programs (Liang et al 2021). For example, in information management systems, developers often include SQL statements within imperative programs; in data mining systems, developers often incorporate regular expressions within imperative programs.…”

Section: Introductionmentioning

confidence: 99%

“…In data science code generation, (Huang et al 2022) found that about 8% of the errors in the generated code are due to syntax issues. Similarly, (Liang et al 2021) have demonstrated the feasibility of using pre-trained language models to generate Turducken-style code but also admitted the less accurate syntactic conformation. The purpose of the automatic code generation model is to improve the efficiency of software development through automation and intelligent techniques.…”

Section: Introductionmentioning

confidence: 99%

“…To evaluate the effectiveness of TurduckenGen, we conduct experiments on two Turduckenstyle code datasets Lyra and Pisces, where Lyra (Liang et al 2021) is obtained from GitHub repositories and annotated by human annotators, whereas Pisces is obtained by manually translating the python code in Lyra into the corresponding Java code through a crowdsourcing approach. TurduckenGen is compared to six state-of-the-art baselines, including Transformer (Vaswani et al 2017), CodeBERT (Feng et al 2020), GraphCodeBERT (Guo et al 2021), GPT (Radford et al 2019), CodeGPT (Lu et al 2021a), and CodeT5 (Wang et al 2021b), in terms of six automatic performance metrics (i.e., BLEU (Papineni et al 2002), Weighted BLEU (Ren et al 2020), Crystal BLEU (Eghbali and Pradel 2022), Syntax Match (Ren et al 2020), Syntax Exact Match (Liang et al 2021), and Code Executable (Liang et al 2021)). The comparison results show that TurduckenGen outperforms these baselines.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

A Syntax-Guided Multi-Task Learning Approach for Turducken-Style Code Generation

Yang¹,

Zhou²,

Chen³

et al. 2023

Preprint

View full text Add to dashboard Cite

Due to the development of pre-trained language models, automated code generation techniques have shown great promise in recent years. However, the generated code is difficult to meet the syntactic constraints of the target language, especially in the case of Turducken-style code, where declarative code snippets are embedded within imperative programs. In this study, we summarize the lack of syntactic constraints into three significant challenges: (1) the efficient representation of syntactic constraints, (2) the effective

show abstract

Section: Turducken-style Codementioning

confidence: 99%

Section: Datasetsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

A Syntax-Guided Multi-Task Learning Approach for Turducken-Style Code Generation

Yang¹,

Zhou²,

Chen³

et al. 2023

Preprint

View full text Add to dashboard Cite

show abstract

A syntax-guided multi-task learning approach for Turducken-style code generation

Yang,

Zhou,

Chen

et al. 2023

Empir Software Eng

View full text Add to dashboard Cite

MCoNaLa: A Benchmark for Code Generation from Multiple Natural Languages

Wang,

Cuenca,

Zhou

et al. 2023

Findings of the Association for Computational Linguistics: EACL 2023

View full text Add to dashboard Cite

While there has been a recent burgeoning of applications at the intersection of natural and programming languages, such as code generation and code summarization, these applications are usually English-centric. This creates a barrier for program developers who are not proficient in English. To mitigate this gap in technology development across languages, we propose a multilingual dataset, MCoNaLa, to benchmark code generation from natural language commands extending beyond English. Modeled off of the methodology from the English Code/Natural Language Challenge (CoNaLa) dataset, we annotated a total of 896 NL-Code pairs in three languages: Spanish, Japanese, and Russian. We present a systematic evaluation on MCoNaLa by testing state-of-the-art code generation systems. Although the difficulties vary across three languages, all systems lag significantly behind their English counterparts, revealing the challenges in adapting code generation to new languages. 1 * Equal contribution. 1 Code and data are available at https://github.com/zorazrw/ multilingual-conala Spanish ¿Cómo sumar el campo `precio` de todos los elementos del modelo `Precompra` en Django? (How to sum the `precio` field of all the elements of the `Precompra` model in Django?) totaldos = Precompra.objects.aggregate(Sum(precio)).values()[0]) Japanese 2次元配列`arr`の要素となっている1次元配列から先頭の値のみを抜き出す

show abstract

Lyra: A Benchmark for Turducken-Style Code Generation

Cited by 3 publications

References 0 publications

A Syntax-Guided Multi-Task Learning Approach for Turducken-Style Code Generation

A Syntax-Guided Multi-Task Learning Approach for Turducken-Style Code Generation

A syntax-guided multi-task learning approach for Turducken-style code generation

MCoNaLa: A Benchmark for Code Generation from Multiple Natural Languages

Contact Info

Product

Resources

About