Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing 2022
DOI: 10.18653/v1/2022.emnlp-main.279
|View full text |Cite
|
Sign up to set email alerts
|

The Optimal BERT Surgeon: Scalable and Accurate Second-Order Pruning for Large Language Models

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
1
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
6
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 16 publications
(4 citation statements)
references
References 0 publications
0
1
0
Order By: Relevance
“…However, this line of methods produces sparse weight matrices, requiring specific hardware support. On the other hand, structured pruning (Xia et al, 2022;Kwon et al, 2022;Kurtic et al, 2023), prunes away structures such as neurons, weight matrix blocks, or layers. Most previous works on structured pruning have focused on encoder-based models (Xia et al, 2022;Kwon et al, 2022;Kurtic et al, 2023), which remove attention heads, columns, and rows of weight matrices using different importance score metrics, including magnitudes or Hessians of weight matrices, and L0 loss.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…However, this line of methods produces sparse weight matrices, requiring specific hardware support. On the other hand, structured pruning (Xia et al, 2022;Kwon et al, 2022;Kurtic et al, 2023), prunes away structures such as neurons, weight matrix blocks, or layers. Most previous works on structured pruning have focused on encoder-based models (Xia et al, 2022;Kwon et al, 2022;Kurtic et al, 2023), which remove attention heads, columns, and rows of weight matrices using different importance score metrics, including magnitudes or Hessians of weight matrices, and L0 loss.…”
Section: Related Workmentioning
confidence: 99%
“…On the other hand, structured pruning (Xia et al, 2022;Kwon et al, 2022;Kurtic et al, 2023), prunes away structures such as neurons, weight matrix blocks, or layers. Most previous works on structured pruning have focused on encoder-based models (Xia et al, 2022;Kwon et al, 2022;Kurtic et al, 2023), which remove attention heads, columns, and rows of weight matrices using different importance score metrics, including magnitudes or Hessians of weight matrices, and L0 loss. However, structured pruning on generative models has been significantly underinvestigated, with only a few available works (Lagunas et al, 2021;Yang et al, 2022;Santacroce et al, 2023).…”
Section: Related Workmentioning
confidence: 99%
“…Recently, BERT (Bidirectional Encoder Representations from Transformers), a neural network architecture based on transformer architecture designed to model data sequences like natural language text, has been rising in natural language processing [12]. BERT has been applied to various NLP tasks, such as machine translation [31,32], language modeling [33], and chatbot [34]. Its training process utilizes next-sentence prediction to understand the relationship between two sentences, making it useful for question answering.…”
Section: Related Workmentioning
confidence: 99%
“…Foundational LLMs and their fine-tuned counterparts have become a cornerstone of NLP research in recent years [10,11]. Extensive literature has addressed the significance of data in shaping the performance of language models across various languages and tasks.…”
Section: Literature Reviewmentioning
confidence: 99%