Can OpenAI's codex fix bugs?

Prenner, Julian Aron; Babii, Hlib; Robbes, Romain

doi:10.1145/3524459.3527351

Cited by 41 publications

(8 citation statements)

References 79 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…There are recent attempts [21,50] to explore few-shot learning of large language models (LLMs) for APR. According to Prenner et al [50], their method based on Codex achieves 46% EM compared to the finetuned T5's 59% on a random sample of 200 instances from TFix, showing that there is still a gap between few-shot learning and finetuning results. Besides, few-shot learning of LLMs requires more engineering efforts for prompting tuning and post-processing [21], which is labor-intensive.…”

Section: Pretrained Language Models For Codementioning

confidence: 99%

RAP-Gen: Retrieval-Augmented Patch Generation with CodeT5 for Automatic Program Repair

Wang,

Joty

et al. 2023

Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Enginee

View full text Add to dashboard Cite

Automatic program repair (APR) is crucial to reduce manual debugging efforts for developers and improve software reliability. While conventional search-based techniques typically rely on heuristic rules or a redundancy assumption to mine fix patterns, recent years have witnessed the surge of deep learning (DL) based approaches to automate the program repair process in a data-driven manner. However, their performance is often limited by a fixed set of parameters to model the highly complex search space of APR.To ease such burden on the parametric models, in this work, we propose a novel Retrieval-Augmented Patch Generation framework (RAP-Gen) by explicitly leveraging relevant fix patterns retrieved from a codebase of previous bug-fix pairs. Specifically, we build a hybrid patch retriever to account for both lexical and semantic matching based on the raw source code in a language-agnostic manner, which does not rely on any code-specific features. In addition, we adapt a code-aware language model CodeT5 as our foundation model to facilitate both patch retrieval and generation tasks in a unified manner. We adopt a stage-wise approach where the patch retriever first retrieves a relevant external bug-fix pair to augment the buggy input for the CodeT5 patch generator, which synthesizes a ranked list of repair patch candidates. Notably, RAP-Gen is a generic APR framework that can flexibly integrate different patch retrievers and generators to repair various types of bugs.We thoroughly evaluate RAP-Gen on three benchmarks in two programming languages, including the TFix benchmark in JavaScript, and Code Refinement and Defects4J benchmarks in Java, where the bug localization information may or may not be provided. Experimental results show that RAP-Gen significantly outperforms previous state-of-the-art (SoTA) approaches on all benchmarks, e.g.,

show abstract

Section: Pretrained Language Models For Codementioning

confidence: 99%

RAP-Gen: Retrieval-Augmented Patch Generation with CodeT5 for Automatic Program Repair

Wang,

Joty

et al. 2023

Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Enginee

View full text Add to dashboard Cite

show abstract

“…AlphaRepair [68] is the first to directly use LLMs for cloze-style (or infilling-style) APR: it masks out the buggy code snippet and then uses CodeBERT [15] to directly fill in the correct code given the surrounding context. While AlphaRepair demonstrates the potential to use encoder-only models for cloze-style APR, other studies [33,55,67] have looked into applying all three types of LLM architecture. FitRepair [66] further improves AlphaRepair via domainspecific fine-tuning and prompting strategies leveraging the plastic surgery hypothesis [6].…”

Section: Automated Program Repairmentioning

confidence: 99%

“…While the performance is impressive, one particular limitation of these techniques is the lack of guidance in patch generation. Prior work mainly treats the LLM as a black box and only queries the model via beam search [68] or sampling [33,55,67]. This means LLMs, while powerful, may still generate invalid patches given the current code context.…”

Section: Automated Program Repairmentioning

confidence: 99%

“…AlphaRepair [68] reformulates the APR problem as a cloze (or infilling) task [2,17]: it first replaces the buggy code snippets with masked tokens and then uses CodeBERT [15] to fill correct code in given the surrounding context. Other studies on LLMs for APR have applied even larger LLMs with different repair settings (including generating complete patch functions) [33,55,67].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Copiloting the Copilots: Fusing Large Language Models with Completion Engines for Automated Program Repair

Wei,

Xia,

Zhang

2023

Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Enginee

View full text Add to dashboard Cite

During Automated Program Repair (APR), it can be challenging to synthesize correct patches for real-world systems in generalpurpose programming languages. Recent Large Language Models (LLMs) have been shown to be helpful "copilots" in assisting developers with various coding tasks, and have also been directly applied for patch synthesis. However, most LLMs treat programs as sequences of tokens, meaning that they are ignorant of the underlying semantics constraints of the target programming language. This results in plenty of statically invalid generated patches, impeding the practicality of the technique. Therefore, we propose Repilot, a framework to further copilot the AI "copilots" (i.e., LLMs) by synthesizing more valid patches during the repair process. Our key insight is that many LLMs produce outputs autoregressively (i.e., token by token), resembling human writing programs, which can be significantly boosted and guided through a Completion Engine. Repilot synergistically synthesizes a candidate patch through the interaction between an LLM and a Completion Engine, which 1) prunes away infeasible tokens suggested by the LLM and 2) proactively completes the token based on the suggestions provided by the Completion Engine. Our evaluation on a subset of the widely-used Defects4j 1.2 and 2.0 datasets shows that Repilot fixes 66 and 50 bugs, respectively, surpassing the best-performing baseline by 14 and 16 bugs fixed. More importantly, Repilot is capable of producing more valid and correct patches than the base LLM when given the same generation budget. CCS CONCEPTS• Software and its engineering → Software testing and debugging; Automatic programming.

show abstract

“…The first one is cloze-style repair, i.e., reframing program repair as a cloze-style task, and then invoking LLMs to predict partially correct code with the help of repair patterns, such as AlphaRepair [165], GAMMA [189], FitRepair [163] and Repilot [156]. The second one is conversational-based repair, i.e., constructing complex prompts with various valuable information (e.g., buggy code, failure diagnostics, even execution feedback), and then chatting with LLMs to generate correct patches, such as Pearce et al [116], TypeFix [117], RustAssistant [22], Zhang et al [190], Prenner et al [119], Sobania et al [133], and Napoli et al [108]. Such repair routes usually require LLMs capable of processing long-text prompts and human-like conversations, thus predominantly employing powerful LLMs with billion-level parameters, like ChatGPT and GPT-4.…”

Section: Zero-shotmentioning

confidence: 99%

Introduction to the Department of Cardiology in Nanjing First Hospital of Nanjing Medical University, China

Zhang

Chen

2020

European Heart Journal

View full text Add to dashboard Cite

Reclosure of ruptured incision after peroral endoscopic myotomy using endoloops and metallic clipsSince Inoue et al. introduced peroral endoscopic myotomy (POEM) into a clinic to treat esophageal achalasia in 2010, the procedure has been carried out in many countries around the world. 1 As more and more POEM is being done, associated technical difficulties and complications may occur. 2 To our knowledge, the present study is the first to report incision rupture after POEM.A 37-year-old man presented to our academic center after experiencing 25 years of dysphagia and 2 months of exacerbation. Barium swallow examination and esophageal manometry diagnosed the patient with type I esophageal achalasia and he agreed to receive POEM.POEM was carried out using the standard technique. After the operation, the patient received routine postoperative care. On the third day after the procedure, the patient had a fever (38.9°C, white blood cell count 18.24 × 10 9 /L, % neutrophils 95.3%) and X-ray showed the absence of several metal clips at the proximal end of the longitudinal incision which revealed the incision rupture.Gastroesophageal endoscopy showed that the middle and proximal parts of the incision was ruptured. It was not possible to reclose the incision with routine clips because of the swollen mucosa around the defect. On endoscopy, an endoloop was inserted and snared the remaining clips in the distal part. In the middle, four clips were anchored onto the defect margins at full thickness and another endoloop was inserted to snare the clips tightly. The same procedure was done in the proximal part (Figs 1,2). After monitoring the patient's condition for several days, he was discharged without any complaints or complications.In the present study, we propose reclosure of a mucosal incision after POEM using conventional endoloops and hemostatic clips. It could reclose the incision regardless of the swollen tissue or the size of the longitudinal incision.

show abstract

Can OpenAI's codex fix bugs?

Cited by 41 publications

References 79 publications

RAP-Gen: Retrieval-Augmented Patch Generation with CodeT5 for Automatic Program Repair

RAP-Gen: Retrieval-Augmented Patch Generation with CodeT5 for Automatic Program Repair

Copiloting the Copilots: Fusing Large Language Models with Completion Engines for Automated Program Repair

Introduction to the Department of Cardiology in Nanjing First Hospital of Nanjing Medical University, China

Contact Info

Product

Resources

About