2020
DOI: 10.48550/arxiv.2006.12139
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Rapid Structural Pruning of Neural Networks with Set-based Task-Adaptive Meta-Pruning

Abstract: As deep neural networks are growing in size and being increasingly deployed to more resource-limited devices, there has been a recent surge of interest in network pruning methods, which aim to remove less important weights or activations of a given network. A common limitation of most existing pruning techniques, is that they require pre-training of the network at least once before pruning, and thus we can benefit from reduction in memory and computation only at the inference time. However, reducing the traini… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2020
2020
2020
2020

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(2 citation statements)
references
References 22 publications
(30 reference statements)
0
2
0
Order By: Relevance
“…Note that an independent and concurrent work [56] also proposes to utilize meta-learning for rapid structural pruning of neural networks. We highlight the main differences below: 1) [56] relies on a centralized meta-learning method where the nodes are required to submit data to a central platform, whereas we consider a more realistic distributed setup and propose a new federated meta-learning approach to fit the specific efficiency problem in our work. 2) [56] takes a stochastic approach and learns a task-specific Bernoulli distribution for mask generation, which however could possibly generate masks that lead to significant performance degradation.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Note that an independent and concurrent work [56] also proposes to utilize meta-learning for rapid structural pruning of neural networks. We highlight the main differences below: 1) [56] relies on a centralized meta-learning method where the nodes are required to submit data to a central platform, whereas we consider a more realistic distributed setup and propose a new federated meta-learning approach to fit the specific efficiency problem in our work. 2) [56] takes a stochastic approach and learns a task-specific Bernoulli distribution for mask generation, which however could possibly generate masks that lead to significant performance degradation.…”
Section: Related Workmentioning
confidence: 99%
“…We highlight the main differences below: 1) [56] relies on a centralized meta-learning method where the nodes are required to submit data to a central platform, whereas we consider a more realistic distributed setup and propose a new federated meta-learning approach to fit the specific efficiency problem in our work. 2) [56] takes a stochastic approach and learns a task-specific Bernoulli distribution for mask generation, which however could possibly generate masks that lead to significant performance degradation. In stark contrast, we develop a deterministic approach by learning a task-specific channel gating module, and also provide theoretic foundations by carrying out a thorough convergence analysis of the proposed federated meta-learning algorithm.…”
Section: Related Workmentioning
confidence: 99%