The Optimal BERT Surgeon: Scalable and Accurate Second-Order Pruning for Large Language Models

Kurtic, Eldar; Campos, Daniel; Nguyen, Tuan V.; Frantar, Elias; Kurtz, Mark; Fineran, Benjamin; Goin, Michael; Alistarh, Dan

doi:10.18653/v1/2022.emnlp-main.279

Cited by 16 publications

(4 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, this line of methods produces sparse weight matrices, requiring specific hardware support. On the other hand, structured pruning (Xia et al, 2022;Kwon et al, 2022;Kurtic et al, 2023), prunes away structures such as neurons, weight matrix blocks, or layers. Most previous works on structured pruning have focused on encoder-based models (Xia et al, 2022;Kwon et al, 2022;Kurtic et al, 2023), which remove attention heads, columns, and rows of weight matrices using different importance score metrics, including magnitudes or Hessians of weight matrices, and L0 loss.…”

Section: Related Workmentioning

confidence: 99%

“…On the other hand, structured pruning (Xia et al, 2022;Kwon et al, 2022;Kurtic et al, 2023), prunes away structures such as neurons, weight matrix blocks, or layers. Most previous works on structured pruning have focused on encoder-based models (Xia et al, 2022;Kwon et al, 2022;Kurtic et al, 2023), which remove attention heads, columns, and rows of weight matrices using different importance score metrics, including magnitudes or Hessians of weight matrices, and L0 loss. However, structured pruning on generative models has been significantly underinvestigated, with only a few available works (Lagunas et al, 2021;Yang et al, 2022;Santacroce et al, 2023).…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

NASH: A Simple Unified Framework of Structured Pruning for Accelerating Encoder-Decoder Language Models

Ko,

Park,

Kim

et al. 2023

Findings of the Association for Computational Linguistics: EMNLP 2023

View full text Add to dashboard Cite

Structured pruning methods have proven effective in reducing the model size and accelerating inference speed in various network architectures such as Transformers. Despite the versatility of encoder-decoder models in numerous NLP tasks, the structured pruning methods on such models are relatively less explored compared to encoder-only models. In this study, we investigate the behavior of the structured pruning of the encoder-decoder models in the decoupled pruning perspective of the encoder and decoder component, respectively. Our findings highlight two insights: (1) the number of decoder layers is the dominant factor of inference speed, and (2) low sparsity in the pruned encoder network enhances generation quality. Motivated by these findings, we propose a simple and effective framework, NASH, that narrows the encoder and shortens the decoder networks of encoder-decoder models. Extensive experiments on diverse generation and inference tasks validate the effectiveness of our method in both speedup and output quality.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

NASH: A Simple Unified Framework of Structured Pruning for Accelerating Encoder-Decoder Language Models

Ko,

Park,

Kim

et al. 2023

Findings of the Association for Computational Linguistics: EMNLP 2023

View full text Add to dashboard Cite

show abstract

“…Recently, BERT (Bidirectional Encoder Representations from Transformers), a neural network architecture based on transformer architecture designed to model data sequences like natural language text, has been rising in natural language processing [12]. BERT has been applied to various NLP tasks, such as machine translation [31,32], language modeling [33], and chatbot [34]. Its training process utilizes next-sentence prediction to understand the relationship between two sentences, making it useful for question answering.…”

Section: Related Workmentioning

confidence: 99%

Contextual Modeling in Context-Aware Conversation Systems

2023

KSII TIIS

View full text Add to dashboard Cite

Conversation modeling is an important and challenging task in the field of natural language processing because it is a key component promoting the development of automated humanmachine conversation. Most recent research concerning conversation modeling focuses only on the current utterance (considered as the current question) to generate a response, and thus fails to capture the conversation's logic from its beginning. Some studies concatenate the current question with previous conversation sentences and use it as input for response generation. Another approach is to use an encoder to store all previous utterances. Each time a new question is encountered, the encoder is updated and used to generate the response. Our approach in this paper differs from previous studies in that we explicitly separate the encoding of the question from the encoding of its context. This results in different encoding models for the question and the context, capturing the specificity of each. In this way, we have access to the entire context when generating the response. To this end, we propose a deep neural network-based model, called the Context Model, to encode previous utterances' information and combine it with the current question. This approach satisfies the need for context information while keeping the different roles of the current question and its context separate while generating a response. We investigate two approaches for representing the context: Long short-term memory and Convolutional neural network. Experiments show that our Context Model outperforms a baseline model on both ConvAI2 Dataset and a collected dataset of conversational English.

show abstract

“…Foundational LLMs and their fine-tuned counterparts have become a cornerstone of NLP research in recent years [10,11]. Extensive literature has addressed the significance of data in shaping the performance of language models across various languages and tasks.…”

Section: Literature Reviewmentioning

confidence: 99%

A study on factors affecting the decision of choosing university of high school pupils in Quang Ngai province

Nguyen¹

2021

VMOST

View full text Add to dashboard Cite

In fact, choosing a career and a university of high school pupils is very important. This study aims at identifying and measuring the factors affecting the decision of choosing a university of high school pupils in Quang Ngai province. The data of this study were collected from 340 answer sheets of grade 12 pupils of 5 high schools in Quang Ngai province in the 2019-2020 school year. By using the quantitative analysis methods, it has been shown that 5 factors are affecting the decision of choosing a university of pupils in Quang Ngai province, in the order of high importance to low including (1) University reputation, (2) Communication, (3) Learning conditions, (4) Factor belongs to the pupils, (5) Individuals.

show abstract

The Optimal BERT Surgeon: Scalable and Accurate Second-Order Pruning for Large Language Models

Cited by 16 publications

References 0 publications

NASH: A Simple Unified Framework of Structured Pruning for Accelerating Encoder-Decoder Language Models

NASH: A Simple Unified Framework of Structured Pruning for Accelerating Encoder-Decoder Language Models

Contextual Modeling in Context-Aware Conversation Systems

A study on factors affecting the decision of choosing university of high school pupils in Quang Ngai province

Contact Info

Product

Resources

About