PruneNet: Channel Pruning via Global Importance

Khetan, Ashish; Karnin, Zohar

doi:10.48550/arxiv.2005.11282

Cited by 4 publications

(6 citation statements)

References 26 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…1(Right). Our RPG networks outperform SOTA pruning methods such as [27,31,62,63,64,28]. Specifically, at the Feature Similarity.…”

Section: Discussionmentioning

confidence: 86%

“…Specifically, with only half ResNet34 backbone parameters, we achieve the same ImageNet top-1 accuracy. We also outperforms model pruning methods such as Knapsack [27] and PruneNet [28].…”

Section: Introductionmentioning

confidence: 93%

“…Right: Employing the Recurrent Parameter Generator (RPG) for ResNet could reduce the model parameters to any size.Specifically, with only half ResNet34 backbone parameters, we achieve the same ImageNet top-1 accuracy. We also outperforms model pruning methods such as Knapsack[27] and PruneNet[28].…”

mentioning

confidence: 93%

See 2 more Smart Citations

Recurrent Parameter Generators

Wang¹,

Chen²,

Yu³

et al. 2021

Preprint

View full text Add to dashboard Cite

We present a generic method for recurrently using the same parameters for many different convolution layers to build a deep network. Specifically, for a network, we create a recurrent parameter generator (RPG), from which the parameters of each convolution layer are generated. Though using recurrent models to build a deep convolutional neural network (CNN) is not entirely new, our method achieves significant performance gain compared to the existing works. We demonstrate how to build a one-layer neural network to achieve similar performance compared to other traditional CNN models on various applications and datasets. Such a method allows us to build an arbitrarily complex neural network with any amount of parameters. For example, we build a ResNet34 with model parameters reduced by more than 400 times, which still achieves 41.6% ImageNet top-1 accuracy. Furthermore, we demonstrate the RPG can be applied at different scales, such as layers, blocks, or even sub-networks. Specifically, we use the RPG to build a ResNet18 network with the number of weights equivalent to one convolutional layer of a conventional ResNet and show this model can achieve 67.2% ImageNet top-1 accuracy. The proposed method can be viewed as an inverse approach to model compression. Rather than removing the unused parameters from a large model, it aims to squeeze more information into a small number of parameters. Extensive experiment results are provided to demonstrate the power of the proposed recurrent parameter generator. * indicates equal contribution.Preprint. Under review.

show abstract

“…1(Right). Our RPG networks outperform SOTA pruning methods such as [27,31,62,63,64,28]. Specifically, at the Feature Similarity.…”

Section: Discussionmentioning

confidence: 86%

“…Specifically, with only half ResNet34 backbone parameters, we achieve the same ImageNet top-1 accuracy. We also outperforms model pruning methods such as Knapsack [27] and PruneNet [28].…”

Section: Introductionmentioning

confidence: 93%

See 1 more Smart Citation

Recurrent Parameter Generators

Wang¹,

Chen²,

Yu³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…much wider than bottom layers and thus have a higher representation capability. Prior studies [20,40,15] have shown that bottom layers have less redundancy than top layers in existing architectures. When tackling multiple tasks together, top layers may have sufficient capacity to learn diverse features while the bottom layers are easily distracted by different tasks during training.…”

Section: Discussionmentioning

confidence: 99%

Rethinking Hard-Parameter Sharing in Multi-Domain Learning

Zhang¹,

Yang²,

Liu³

et al. 2021

Preprint

View full text Add to dashboard Cite

Hard parameter sharing in multi-task learning (MTL) allows tasks to share some of model parameters, reducing storage cost and improving prediction accuracy. The common sharing practice is to share bottom layers of a deep neural network among tasks while using separate top layers for each task. In this work, we revisit this common practice via an empirical study on fine-grained image classification tasks and make two surprising observations. (1) Using separate bottom-layer parameters could achieve significantly better performance than the common practice and this phenomenon holds for different number of tasks jointly trained on different backbone architectures with different quantity of task-specific parameters. (2) A multi-task model with a small proportion of task-specific parameters from bottom layers can achieve competitive performance with independent models trained on each task separately and outperform a state-of-the-art MTL framework. Our observations suggest that people rethink the current sharing paradigm and adopt the new strategy of using separate bottom-layer parameters as a stronger baseline for model design in MTL.

show abstract

“…A potential explanation is that top layers of a modern CNN architecture are usually much wider than bottom layers and thus have a higher representation capability. Prior studies [16], [17], [55] have shown that bottom layers have less redundancy than top layers in existing architectures. When tackling multiple domains together, top layers may have sufficient capacity to learn diverse features while the bottom layers are easily distracted by different domains during training.…”

Section: A Why the Bottom-specific Sharing Strategy Outperforms The T...mentioning

confidence: 99%

An Alternative Hard-Parameter Sharing Paradigm for Multi-Domain Learning

et al. 2023

View full text Add to dashboard Cite

Hard-parameter sharing in multi-domain learning (MDL) allows domains to share some model parameters in order to reduce storage cost while improving prediction accuracy. One traditional paradigm of the sharing practice borrows an idea from multi-task learning (MTL), which is to share bottom layers of a deep neural network among domains while using separate top layers for each domain. However, it is unclear whether the effectiveness of sharing bottom parameters in MTL can transfer to MDL or not. Therefore in this work, we revisit this common practice via an empirical study on image classification tasks on a diverse set of visual domains and make two surprising observations. (1) Using separate bottom-layer parameters could achieve significantly better performance than the common practice and this phenomenon holds for the different number of domains jointly trained on different backbone architectures with different quantities of domain-specific parameters. (2) A multi-domain model with a small proportion of domain-specific parameters from bottom layers can achieve competitive performance with independent models trained on each domain separately. Our observations suggest that people adopt the new paradigm of using separate bottom-layer parameters as a stronger baseline for model design in MDL.INDEX TERMS Empirical study, hard-parameter sharing, multi-domain learning.

show abstract

PruneNet: Channel Pruning via Global Importance

Cited by 4 publications

References 26 publications

Recurrent Parameter Generators

Recurrent Parameter Generators

Rethinking Hard-Parameter Sharing in Multi-Domain Learning

An Alternative Hard-Parameter Sharing Paradigm for Multi-Domain Learning

Contact Info

Product

Resources

About