2022
DOI: 10.48550/arxiv.2204.00408
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Structured Pruning Learns Compact and Accurate Models

Abstract: The growing size of neural language models has led to increased attention in model compression.The two predominant approaches are pruning, which gradually removes weights from a pre-trained model, and distillation, which trains a smaller compact model to match a larger one. Pruning methods can significantly reduce the model size but hardly achieve large speedups as distillation. However, distillation methods require large amounts of unlabeled data and are expensive to train. In this work, we propose a task-spe… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
10
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 8 publications
(19 citation statements)
references
References 29 publications
1
10
0
Order By: Relevance
“…For the sake of a corner case that all structures in a module are pruned, we skip the module by feeding the input as the output. While we can alternate to an quite recent pruning method [22] exploiting both coarse-grained and fine-grained strategies for state-of-the-art performance, we argue that our framework is agnostic to pruning methods and keep the pruning method simple.…”
Section: A Technical Details Of Pruningmentioning
confidence: 99%
See 4 more Smart Citations
“…For the sake of a corner case that all structures in a module are pruned, we skip the module by feeding the input as the output. While we can alternate to an quite recent pruning method [22] exploiting both coarse-grained and fine-grained strategies for state-of-the-art performance, we argue that our framework is agnostic to pruning methods and keep the pruning method simple.…”
Section: A Technical Details Of Pruningmentioning
confidence: 99%
“…A number of approaches have been proposed to identify a good structure at a scale, including dynamic search [12], layer dropping [21] and pruning [11]. In this work, we adopt pruning to assign structures A k to the candidates due to its known advantages in knowledge distillation [22]. Concretely, following previous work [11], the pruning starts with the least important parameters/features based on their importance scores, which are approximated by masking the parameterized structures.…”
Section: Specificationmentioning
confidence: 99%
See 3 more Smart Citations