2020
DOI: 10.48550/arxiv.2005.11282
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

PruneNet: Channel Pruning via Global Importance

Ashish Khetan,
Zohar Karnin

Abstract: Channel pruning is one of the predominant approaches for accelerating deep neural networks. Most existing pruning methods either train from scratch with a sparsity inducing term such as group lasso, or prune redundant channels in a pretrained network and then fine tune the network. Both strategies suffer from some limitations: the use of group lasso is computationally expensive, difficult to converge and often suffers from worse behavior due to the regularization bias. The methods that start with a pretrained … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
6
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
2
1
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(6 citation statements)
references
References 26 publications
0
6
0
Order By: Relevance
“…1(Right). Our RPG networks outperform SOTA pruning methods such as [27,31,62,63,64,28]. Specifically, at the Feature Similarity.…”
Section: Discussionmentioning
confidence: 86%
See 2 more Smart Citations
“…1(Right). Our RPG networks outperform SOTA pruning methods such as [27,31,62,63,64,28]. Specifically, at the Feature Similarity.…”
Section: Discussionmentioning
confidence: 86%
“…Specifically, with only half ResNet34 backbone parameters, we achieve the same ImageNet top-1 accuracy. We also outperforms model pruning methods such as Knapsack [27] and PruneNet [28].…”
Section: Introductionmentioning
confidence: 93%
See 1 more Smart Citation
“…much wider than bottom layers and thus have a higher representation capability. Prior studies [20,40,15] have shown that bottom layers have less redundancy than top layers in existing architectures. When tackling multiple tasks together, top layers may have sufficient capacity to learn diverse features while the bottom layers are easily distracted by different tasks during training.…”
Section: Discussionmentioning
confidence: 99%
“…A potential explanation is that top layers of a modern CNN architecture are usually much wider than bottom layers and thus have a higher representation capability. Prior studies [16], [17], [55] have shown that bottom layers have less redundancy than top layers in existing architectures. When tackling multiple domains together, top layers may have sufficient capacity to learn diverse features while the bottom layers are easily distracted by different domains during training.…”
Section: A Why the Bottom-specific Sharing Strategy Outperforms The T...mentioning
confidence: 99%