Proceedings of the Third International Workshop on Automated Program Repair 2022
DOI: 10.1145/3524459.3527351
|View full text |Cite
|
Sign up to set email alerts
|

Can OpenAI's codex fix bugs?

Abstract: Recently, we can notice a transition to data-driven techniques in Automated Program Repair (APR), in particular towards deep neural networks. This entails training on hundreds of thousands or even millions of non-executable code fragments. We would like to bring more attention to an aspect of code often neglected in Neural Program Repair (NPR), namely its execution. Code execution has several significant advantages. It allows for test-based evaluation of candidate fixes and can provide valuable information to … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
8
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 41 publications
(8 citation statements)
references
References 79 publications
0
8
0
Order By: Relevance
“…There are recent attempts [21,50] to explore few-shot learning of large language models (LLMs) for APR. According to Prenner et al [50], their method based on Codex achieves 46% EM compared to the finetuned T5's 59% on a random sample of 200 instances from TFix, showing that there is still a gap between few-shot learning and finetuning results. Besides, few-shot learning of LLMs requires more engineering efforts for prompting tuning and post-processing [21], which is labor-intensive.…”
Section: Pretrained Language Models For Codementioning
confidence: 99%
“…There are recent attempts [21,50] to explore few-shot learning of large language models (LLMs) for APR. According to Prenner et al [50], their method based on Codex achieves 46% EM compared to the finetuned T5's 59% on a random sample of 200 instances from TFix, showing that there is still a gap between few-shot learning and finetuning results. Besides, few-shot learning of LLMs requires more engineering efforts for prompting tuning and post-processing [21], which is labor-intensive.…”
Section: Pretrained Language Models For Codementioning
confidence: 99%
“…AlphaRepair [68] is the first to directly use LLMs for cloze-style (or infilling-style) APR: it masks out the buggy code snippet and then uses CodeBERT [15] to directly fill in the correct code given the surrounding context. While AlphaRepair demonstrates the potential to use encoder-only models for cloze-style APR, other studies [33,55,67] have looked into applying all three types of LLM architecture. FitRepair [66] further improves AlphaRepair via domainspecific fine-tuning and prompting strategies leveraging the plastic surgery hypothesis [6].…”
Section: Automated Program Repairmentioning
confidence: 99%
“…While the performance is impressive, one particular limitation of these techniques is the lack of guidance in patch generation. Prior work mainly treats the LLM as a black box and only queries the model via beam search [68] or sampling [33,55,67]. This means LLMs, while powerful, may still generate invalid patches given the current code context.…”
Section: Automated Program Repairmentioning
confidence: 99%
See 1 more Smart Citation
“…The first one is cloze-style repair, i.e., reframing program repair as a cloze-style task, and then invoking LLMs to predict partially correct code with the help of repair patterns, such as AlphaRepair [165], GAMMA [189], FitRepair [163] and Repilot [156]. The second one is conversational-based repair, i.e., constructing complex prompts with various valuable information (e.g., buggy code, failure diagnostics, even execution feedback), and then chatting with LLMs to generate correct patches, such as Pearce et al [116], TypeFix [117], RustAssistant [22], Zhang et al [190], Prenner et al [119], Sobania et al [133], and Napoli et al [108]. Such repair routes usually require LLMs capable of processing long-text prompts and human-like conversations, thus predominantly employing powerful LLMs with billion-level parameters, like ChatGPT and GPT-4.…”
Section: Zero-shotmentioning
confidence: 99%