2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE) 2021
DOI: 10.1109/icse43902.2021.00041
|View full text |Cite
|
Sign up to set email alerts
|

Studying the Usage of Text-To-Text Transfer Transformer to Support Code-Related Tasks

Abstract: Deep learning (DL) techniques are gaining more and more attention in the software engineering community. They have been used to support several code-related tasks, such as automatic bug fixing and code comments generation. Recent studies in the Natural Language Processing (NLP) field have shown that the Text-To-Text Transfer Transformer (T5) architecture can achieve state-of-the-art performance for a variety of NLP tasks. The basic idea behind T5 is to first pre-train a model on a large and generic dataset usi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
67
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 160 publications
(73 citation statements)
references
References 48 publications
1
67
0
Order By: Relevance
“…CuBERT employs BERT's powerful masked language modeling objective to derive generic code-specific representation, and CodeBERT further adds a replaced token detection (Clark et al, 2020) encoder-decoder models based on T5 for programming language pre-training and support a more comprehensive set of tasks. Some emerging work (Clement et al, 2020;Mastropaolo et al, 2021;Elnaggar et al, 2021) in the recent literature also explore the T5 framework on code, but they only focus on a limited subset of generation tasks and do not support understanding tasks like us. Apart from these, PLBART (Ahmad et al, 2021) based on another encoder-decoder model BART can also support both understanding and generation tasks.…”
Section: Related Workmentioning
confidence: 99%
“…CuBERT employs BERT's powerful masked language modeling objective to derive generic code-specific representation, and CodeBERT further adds a replaced token detection (Clark et al, 2020) encoder-decoder models based on T5 for programming language pre-training and support a more comprehensive set of tasks. Some emerging work (Clement et al, 2020;Mastropaolo et al, 2021;Elnaggar et al, 2021) in the recent literature also explore the T5 framework on code, but they only focus on a limited subset of generation tasks and do not support understanding tasks like us. Apart from these, PLBART (Ahmad et al, 2021) based on another encoder-decoder model BART can also support both understanding and generation tasks.…”
Section: Related Workmentioning
confidence: 99%
“…For instance, there have been numerous investigations into techniques for code summarization, from early approaches using text retrieval [11] through to more recent ML-based approaches that are often framed as a Neural Machine Translation (NMT) problem (e.g., see recent works by Mastropaolo et al [17] and Haque et al [12]). Generally, deep learning approaches for code summarization try to infer information that is not explicitly in the code, being trained on code/comment pairs (such as LeClair and McMillan's dataset [14]) and evaluated on regular (i.e., not decompiled) source code.…”
Section: Reading List: Related Workmentioning
confidence: 99%
“…Other pretrained transformers used on source code include CodeT5 (Wang et al, 2021b). Code-Trans (Elnaggar et al, 2021), PyMT5 (Clement et al, 2020), CuBERT (Kanade et al, 2020), PLBART , ProphetNet-X (Qi et al, 2021), CoTexT (Phan et al, 2021), T5-Code (Mastropaolo et al, 2021), GraphCode-BERT , and AlphaCode (Li et al, 2022). Pretrained GPT-style Models for source code generation include CodeGPT , and GPT-Codex (Chen et al, 2021a).…”
Section: Pretrained Transformer Modelsmentioning
confidence: 99%