The Optimal BERT Surgeon: Scalable and Accurate Second-Order Pruning for Large Language Models

Kurtic, Eldar; Campos, Daniel; Nguyen, Tuan V.; Frantar, Elias; Kurtz, Mark; Fineran, Benjamin; Goin, Michael; Alistarh, Dan

doi:10.48550/arxiv.2203.07259

Cited by 8 publications

(23 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Singh and Alistarh [2020] investigated a diagonal block-wise approximation with a predefined block size B, which reduces storage cost from O(d 2 ) to O(Bd), and showed that this approach can lead to strong results when pruning CNNs. Kurtic et al [2022] proposed a formula for block pruning, together with a set of non-trivial optimizations to efficiently compute the block inverse, which allowed them to scale the approach for the first time to large language models.…”

Section: Background and Problem Setupmentioning

confidence: 99%

“…where E Q ∈ R |Q|×d is a matrix of basis vectors for each weight in Q. The corresponding saliency score for the group of weights Q and the update δw * Q of remaining weights is [Kurtic et al, 2022]:…”

Section: Background and Problem Setupmentioning

confidence: 99%

“…The only existing prior work on unstructured ViT pruning is SViTE [Chen et al, 2021], which applied the general RigL pruning method [Evci et al, 2020] to the special case of ViT models. We also present results relative to well-tuned magnitude pruning, the best existing second-order pruners [Singh and Alistarh, 2020, Frantar et al, 2021, Kurtic et al, 2022 and AC/DC pruning . oViT improves upon existing methods across almost all benchmarks, by large margins at high sparsity.…”

Section: Introductionmentioning

confidence: 99%

“…[Dong et al, 2017, Wang et al, 2019, Singh and Alistarh, 2020, Yu et al, 2022. Variants of this approach have been shown to produce good results for both gradual pruning [Frantar et al, 2021, Kurtic et al, 2022 and one-shot (post-training) compression . In this paper, we consider the standard gradual pruning setup; from this perspective, the closest work to ours is [Frantar et al, 2021, Kurtic et al, 2022, who propose different approximations.…”

Section: Introductionmentioning

confidence: 99%

“…Variants of this approach have been shown to produce good results for both gradual pruning [Frantar et al, 2021, Kurtic et al, 2022 and one-shot (post-training) compression . In this paper, we consider the standard gradual pruning setup; from this perspective, the closest work to ours is [Frantar et al, 2021, Kurtic et al, 2022, who propose different approximations. We make a significant departure from existing work by introducing a new layer-wise approximation of second-order information for pruning, which we show to be especially-accurate for ViT models.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

oViT: An Accurate Second-Order Pruning Framework for Vision Transformers

Denis¹,

Kurtic²,

Frantar³

et al. 2022

Preprint

View full text Add to dashboard Cite

Models from the Vision Transformer (ViT) family have recently provided breakthrough results across image classification tasks such as ImageNet. Yet, they still face barriers to deployment, notably the fact that their accuracy can be severely impacted by compression techniques such as pruning. In this paper, we take a step towards addressing this issue by introducing Optimal ViT Surgeon (oViT), a new state-of-the-art method for the weight sparsification of Vision Transformers (ViT) models. At the technical level, oViT introduces a new weight pruning algorithm which leverages second-order information, specifically adapted to be both highly-accurate and efficient in the context of ViTs. We complement this accurate one-shot pruner with an in-depth investigation of gradual pruning, augmentation, and recovery schedules for ViTs, which we show to be critical for successful ViT compression. We validate our method via extensive experiments on classical ViT and DeiT models, as well as on newer variants, such as XCiT, EfficientFormer and Swin. Moreover, our results are even relevant to recently-proposed highly-accurate ResNets. Our results show for the first time that ViT-family models can in fact be pruned to high sparsity levels (e.g. ≥ 75%) with low impact on accuracy (≤ 1% relative drop), and that our approach outperforms prior methods by significant margins at high sparsities. In addition, we show that our method is compatible with structured pruning methods and quantization, and that it can lead to significant speedups on a sparsity-aware inference engine.

show abstract

Section: Background and Problem Setupmentioning

confidence: 99%

Section: Background and Problem Setupmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

oViT: An Accurate Second-Order Pruning Framework for Vision Transformers

Denis¹,

Kurtic²,

Frantar³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

Don’t Be So Dense: Sparse-to-Sparse GAN Training Without Sacrificing Performance

et al. 2023

View full text Add to dashboard Cite

published version features the final layout of the paper including the volume, issue and page numbers. Link to publication General rightsCopyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights.• Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain • You may freely distribute the URL identifying the publication in the public portal.If the publication is distributed under the terms of Article 25fa of the Dutch Copyright Act, indicated by the "Taverne" license above, please follow below link for the End User

show abstract

Medicine-Engineering Interdisciplinary Research Based on Bibliometric Analysis: A Case Study on Medicine-Engineering Institutional Cooperation of Shanghai Jiao Tong University

Wang

Cui

Deng

2022

J. Shanghai Jiaotong Univ. (Sci.)

View full text Add to dashboard Cite

This article aims to provide reference for medicine-engineering interdisciplinary research. Targeted at the scientific literature and patent literature published by Shanghai Jiao Tong University, this article attempts to set up co-occurrence matrix of medicine-engineering institutional information which was extracted from address fields of the papers, so as to construct the medicine-engineering intersection datasets. The dataset of scientific literature was analyzed using bibliometrics and visualization methods from multiple dimensions, and the most active factors, such as trends of output, journal and subject distribution, were identified from the indicators of category normalized citation impact (CNCI), times cited, keywords, citation topics and the degree of medicine-engineering interdisplinary. Research on hotspots and trends was discussed in detail. Analyses of the dataset of patent literature showed research themes and measured the degree for technology convergence of medicine-engineering.

show abstract

The Optimal BERT Surgeon: Scalable and Accurate Second-Order Pruning for Large Language Models

Cited by 8 publications

References 15 publications

oViT: An Accurate Second-Order Pruning Framework for Vision Transformers

oViT: An Accurate Second-Order Pruning Framework for Vision Transformers

Don’t Be So Dense: Sparse-to-Sparse GAN Training Without Sacrificing Performance

Medicine-Engineering Interdisciplinary Research Based on Bibliometric Analysis: A Case Study on Medicine-Engineering Institutional Cooperation of Shanghai Jiao Tong University

Contact Info

Product

Resources

About