Natural Language to Code Generation in Interactive Data Science Notebooks

Yin, Pengcheng; Li, Wen-Ding; Xiao, Kefan; Rao, Abhishek; Wen, Yeming; Shi, Kensen; Howland, J. J.; Bailey, Paige; Catasta, Michele; Michalewski, Henryk; Polozov, Alex; Sutton, Charles

doi:10.48550/arxiv.2212.09248

Cited by 2 publications

(1 citation statement)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Many benchmarks have focused on code generation in APIs. Benchmarks like DS-1000 (Lai et al, 2023), ARCADE (Yin et al, 2022), NumpyEval , and PandasEval (Jain et al, 2022) focus on data science APIs. Other benchmarks measure using broader APIs or general software engineering tasks, such as JuICe (Agashe et al, 2019), APIBench (Patil et al, 2023), RepoBench , ODEX (Wang et al, 2022b), SWE-Bench (Jimenez et al, 2023), GoogleCodeRepo (Shrivastava et al, 2023), RepoEval , and Cocomic-Data .…”

Section: Code Generationmentioning

confidence: 99%

Contrastive Code Representation Learning

Jain¹,

Jain²,

Zhang³

et al. 2021

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

Recent work learns contextual representations of source code by reconstructing tokens from their context. For downstream semantic understanding tasks like code clone detection, these representations should ideally capture program functionality. However, we show that the popular reconstruction-based RoBERTa model is sensitive to source code edits, even when the edits preserve semantics. We propose Con-traCode: a contrastive pre-training task that learns code functionality, not form. Con-traCode pre-trains a neural network to identify functionally similar variants of a program among many non-equivalent distractors. We scalably generate these variants using an automated source-to-source compiler as a form of data augmentation. Contrastive pretraining outperforms RoBERTa on an adversarial code clone detection benchmark by 39% AUROC. Surprisingly, improved adversarial robustness translates to better accuracy over natural code; ContraCode improves summarization and TypeScript type inference accuracy by 2 to 13 percentage points over competitive baselines. All source is available at https://github.com/parasj/contracode.

show abstract

Section: Code Generationmentioning

confidence: 99%

Contrastive Code Representation Learning

Jain¹,

Jain²,

Zhang³

et al. 2021

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

show abstract

Linguacodus: a synergistic framework for transformative code generation in machine learning pipelines

Trofimova,

Sataev,

Ustyuzhanin

2024

PeerJ Computer Science

View full text Add to dashboard Cite

In the ever-evolving landscape of machine learning, seamless translation of natural language descriptions into executable code remains a formidable challenge. This article introduces Linguacodus, an innovative framework designed to tackle this challenge by deploying a dynamic pipeline that iteratively transforms natural language task descriptions into code through high-level data-shaping instructions. The core of Linguacodus is a fine-tuned large language model, empowered to evaluate diverse solutions for various problems and select the most fitting one for a given task. This article details the fine-tuning process and sheds light on how natural language descriptions can be translated into functional code. Linguacodus represents a substantial leap towards automated code generation, effectively bridging the gap between task descriptions and executable code. It holds great promise for advancing machine learning applications across diverse domains. Additionally, we propose an algorithm capable of transforming a natural description of an ML task into code with minimal human interaction. In extensive experiments on a vast machine learning code dataset originating from Kaggle, we showcase the effectiveness of Linguacodus. The investigations highlight its potential applications across diverse domains, emphasizing its impact on applied machine learning in various scientific fields.

show abstract

Natural Language to Code Generation in Interactive Data Science Notebooks

Cited by 2 publications

References 0 publications

Contrastive Code Representation Learning

Contrastive Code Representation Learning

Linguacodus: a synergistic framework for transformative code generation in machine learning pipelines

Contact Info

Product

Resources

About