2022
DOI: 10.48550/arxiv.2206.14390
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Diet Code is Healthy: Simplifying Programs for Pre-Trained Models of Code

Zhaowei Zhang,
Hongyu Zhang,
Beijun Shen
et al.

Abstract: Pre-trained code representation models such as CodeBERT have demonstrated superior performance in a variety of software engineering tasks, yet they are often heavy in complexity, quadratically with the length of input sequence. Our empirical analysis on CodeBERT's attention reveals that CodeBERT pays more attention to certain types of tokens and statements such as keywords and data-relevant statements. Based on these findings, we propose Diet-CodeBERT, which aims at lightweight leverage of large pre-trained mo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2

Citation Types

0
4
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(4 citation statements)
references
References 30 publications
0
4
0
Order By: Relevance
“…Specifically, the higher the attention weights, the more attention that is paid by the model. Therefore, there have been many prior studies that employ the attention weights of pre-trained programming language models to explain model predictions [46,49,55]. Prior studies calculate the feature importance of each token by averaging the attention weights of all layers and heads.…”
Section: Attention-based Analysismentioning
confidence: 99%
See 3 more Smart Citations
“…Specifically, the higher the attention weights, the more attention that is paid by the model. Therefore, there have been many prior studies that employ the attention weights of pre-trained programming language models to explain model predictions [46,49,55]. Prior studies calculate the feature importance of each token by averaging the attention weights of all layers and heads.…”
Section: Attention-based Analysismentioning
confidence: 99%
“…Actually, pre-trained code generation models contain multiple encoder and decoder layers. Although averaging attention weights to explain encoder-based code models is widely employed by prior works [46,49,55], Wan et al [46] have presented the great variability between different layers. However, it remains unclear how to determine which attention weights are more important for model inference.…”
Section: Attention-based Analysismentioning
confidence: 99%
See 2 more Smart Citations