Using Pre-Trained Models to Boost Code Review Automation

Tufano, Rosalia; Masiero, Simone; Mastropaolo, Antonio; Pascarella, Luca; Poshyvanyk, Denys; Bavota, Gabriele

doi:10.48550/arxiv.2201.06850

Cited by 2 publications

(17 citation statements)

References 31 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To tackle the problems, we pre-train CodeReviewer, an encoderdecoder transformer model. Different from Tufano et al [40]'s work, CodeReviewer is pre-trained on a large dataset in code review scenario, consisting of code diff hunks and code review comments. We propose four pre-training tasks, including diff tag prediction, denoising code diff, denoising review comment, and review comment generation to make CodeReviewer better understand code diffs and generate review comments.…”

Section: Pull Requests In Githubmentioning

confidence: 99%

“…The input is still a code change, i.e., X = {𝐷 (𝐶 0 , 𝐶 1 )}, with its context. In some previous works [18,40,41], researchers use the changed code as input but not the code diff, without taking into account that review comments have to focus on the changed part. It's not recommended for reviewers to give suggestions to the code context which has not been revised.…”

Section: Code Review Generationmentioning

confidence: 99%

“…To demonstrate the superiority of our multilingual code review related pre-training dataset and carefully designed pre-training tasks, we compare our CodeReviewer model with three baselines, including a state-of-the-art (SOTA) model architecture Transformer [42] trained from scratch and two pre-trained models: T5 for code review [40] and CodeT5 [43].…”

Section: Baseline Modelsmentioning

confidence: 99%

“…Many researchers have explored ways to assist reviewers and committers (i.e., code authors) to reduce their workload in the code review process, such as recommending the best reviewer [9,37], recommending or generating the possible review comments [18,40,41] and even revising the code before submitting it for review [40]. This paper shares the same goal to automate some specific tasks related to code review.…”

Section: Introductionmentioning

confidence: 99%

“…We demonstrate that it cannot generate any meaningful comment in the review generation task based on our evaluation. Tufano et al [40] attempt to use a pre-trained model for code review automation. However, their pre-training dataset is collected from Stack Overflow and Code-SearchNet [21], which is not directly related to code review process.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

CodeReviewer: Pre-Training for Automating Code Review Activities

Li¹,

Lü²,

Guo³

et al. 2022

Preprint

View full text Add to dashboard Cite

Code review is an essential part to software development lifecycle since it aims at guaranteeing the quality of codes. Modern code review activities necessitate developers viewing, understanding and even running the programs to assess logic, functionality, latency, style and other factors. It turns out that developers have to spend far too much time reviewing the code of their peers. Accordingly, it is in significant demand to automate the code review process. In this research, we focus on utilizing pre-training techniques for the tasks in the code review scenario. We collect a large-scale dataset of real world code changes and code reviews from open-source projects in nine of the most popular programming languages. To better understand code diffs and reviews, we propose CodeReviewer, a pre-trained model that utilizes four pre-training tasks tailored specifically for the code review senario. To evaluate our model, we focus on three key tasks related to code review activities, including code change quality estimation, review comment generation and code refinement. Furthermore, we establish a high-quality benchmark dataset based on our collected data for these three tasks and conduct comprehensive experiments on it. The experimental results demonstrate that our model outperforms the previous state-of-theart pre-training approaches in all tasks. Further analysis show that our proposed pre-training tasks and the multilingual pre-training dataset benefit the model on the understanding of code changes and reviews.

show abstract

Section: Pull Requests In Githubmentioning

confidence: 99%

Section: Code Review Generationmentioning

confidence: 99%

Section: Baseline Modelsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

CodeReviewer: Pre-Training for Automating Code Review Activities

Li¹,

Lü²,

Guo³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

CodeEditor: Learning to Edit Source Code with Pre-trained Models

Li¹,

Li²,

Li³

et al. 2022

Preprint

View full text Add to dashboard Cite

Developers often perform repetitive code editing activities (up to 70%) for various reasons (e.g., code refactor) during software development. Many deep learning (DL) models are applied to automate code editing by learning from the code editing history. Among DL-based models, pre-trained code editing models have achieved the state-of-the-art (SOTA) results. Pre-trained models are first pre-trained with pre-training tasks and fine-tuned with the code editing task. Existing pre-training tasks mainly are code infilling tasks (e.g., masked language modeling), which are derived from the natural language processing field and are not designed for automatic code editing.In this paper, we propose a novel pre-training task specialized in code editing and present an effective pre-trained code editing model named CodeEditor. Compared to previous code infilling tasks, our pre-training task further improves the performance and generalization ability of code editing models. Specifically, we collect lots of real-world code snippets as the ground truth and use a powerful generator to rewrite them into natural but inferior versions. Then, we pre-train our CodeEditor to edit inferior versions into the corresponding ground truth, to learn edit patterns. We conduct experiments on four code editing datasets and evaluate the pre-trained CodeEditor in three settings (i.e. fine-tuning, few-shot, and zero-shot). ( 1) In the fine-tuning setting, we train the pre-trained CodeEditor with four datasets and evaluate it on the test data. CodeEditor outperforms the SOTA baselines by 15%, 25.5%, and 9.4% and 26.6% on four datasets. (2) In the few-shot setting, we train the pre-trained CodeEditor with limited data and evaluate it on the test data. CodeEditor substantially performs better than all baselines, even outperforming baselines that are fine-tuned with all data. (3) In the zero-shot setting, we evaluate the pre-trained CodeEditor on the test data without fine-tuning.CodeEditor correctly edits 1,113 programs while the SOTA baselines can not work. The results prove that the superiority of our pre-training task and the pre-trained CodeEditor is more effective in automatic code editing.

show abstract

Using Pre-Trained Models to Boost Code Review Automation

Cited by 2 publications

References 31 publications

CodeReviewer: Pre-Training for Automating Code Review Activities

CodeReviewer: Pre-Training for Automating Code Review Activities

CodeEditor: Learning to Edit Source Code with Pre-trained Models

Contact Info

Product

Resources

About