2022
DOI: 10.48550/arxiv.2204.09656
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

A Fast Post-Training Pruning Framework for Transformers

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 0 publications
0
2
0
Order By: Relevance
“…We study the performance of ZipLM when applied purely in one-shot, without any retraining. In this setting we compare against the recently proposed state-of-the-art method Kwon et al (2022) which is based on several heuristics: Fisherbased mask search, mask rearrangement and mask tuning. More accurate versions for some of those aspects arise naturally in our pruning framework.…”
Section: Additional Validationmentioning
confidence: 99%
“…We study the performance of ZipLM when applied purely in one-shot, without any retraining. In this setting we compare against the recently proposed state-of-the-art method Kwon et al (2022) which is based on several heuristics: Fisherbased mask search, mask rearrangement and mask tuning. More accurate versions for some of those aspects arise naturally in our pruning framework.…”
Section: Additional Validationmentioning
confidence: 99%
“…Introducing sparsity can reduce memory consumption and accelerate inference [35]. Pruning has also been used as an approach to reduce inference cost [71]. Quantization and sparse updates can reduce the training cost [85].…”
Section: Introduction 11 Backgroundmentioning
confidence: 99%