Online Filter Clustering and Pruning for Efficient Convnets

Zhou, Ziheng; Zhou, Wengang; Li, Houqiang; Hong, Richang

doi:10.1109/icip.2018.8451123

Cited by 20 publications

(9 citation statements)

References 26 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In this section, we evaluate our proposed filter pruning method on CIFAR10 and CIFAR100 benchmarks with the single-branch VGGNet-16 and multi-branch ResNet-56 and Resnet-110 networks. We compare our method with the previous methods, such as [6], [9], [12], [13], [27], [31], [32].…”

Section: Methodsmentioning

confidence: 99%

Filter Pruning Without Damaging Networks Capacity

et al. 2020

View full text Add to dashboard Cite

Due to its over-parameterized design, the deep convolutional neural networks lead to a huge amount of parameters and high computational cost, making it difficult to deploy on some devices with limited computational resources in reality. In this paper, we propose a method of filter pruning without damaging networks capacity to accelerate and compress deep convolutional neural networks. Differently from some existing filter pruning methods, we pay more attention to the damage by filter pruning to model capacity. In order to restore the original model capacity, we generate new feature maps on the basis of the remaining feature maps with lighter structure after pruning the redundant filters that are similar with the others. Experimental results on CIFAR10 and CIFAR100 benchmarks demonstrate the effectiveness of our method. Especially, our method reduces more than 49% FLOPs for VGGNet-16 on CIFAR10 with only 0.07% relative accuracy drop. The relative accuracy has even been increased by 0.13% with reducing more than 24% FLOPs. Moreover, our method accelerates ResNet-110 on CIFAR10 by 22.1% with 0.41% accuracy improvement, which exceeds the previous methods. INDEX TERMS Convolutional neural networks, filter pruning, filter similarity, model capacity, networks compression and acceleration.

show abstract

Section: Methodsmentioning

confidence: 99%

Filter Pruning Without Damaging Networks Capacity

et al. 2020

View full text Add to dashboard Cite

show abstract

“…The filters with a high level of similarity can be replaced so that even if these filters are removed, the remaining filters can still replace their functions. Zhou et al 14 measured the similarity between filters using clustering and He et al 15 used the geometric median. Duan et al 16 adopted the Pearson correlation coefficient to measure the similarity between output feature maps, then decide which filters to remove based on the similarity information.…”

Section: Related Workmentioning

confidence: 99%

An information entropy-based filter pruning method for efficient ConvNets

Zhao

2023

Second International Conference on Algorithms, Microchips, and Network Applications (AMNA 2023)

View full text Add to dashboard Cite

In recent years, deep Convolutional Neural Networks (ConvNets) have achieved great success in many vision tasks. While larger and deeper networks perform better, deploying these models on resource-constrained devices makes it more challenging. CNN models can be effectively compressed and accelerated via filter pruning. The traditional filter pruning method is based on a pre-trained model, pruning, and then recovering model accuracy through fine-tuning. However, recent research had shown that using pre-trained models to prune a model reduces the optimization space of the pruned model. Obtaining a pre-trained model is also computationally intensive. A further weakness of the prior works is that the process for selecting filters is too simplistic and insufficient for identifying possibly more important filters. To address these weaknesses, we propose a pruning method that does not rely on a pre-trained model and instead uses the information entropy of the channels during training as a criterion of the filter’s importance. In each training step, we calculate the importance of each channel in each layer of the network based on information entropy. Since the channel in the current layer is calculated based on the filter in the previous layer, the importance of the filter can be equated with the importance of the channel. Then the least important filters are pruned according to the pre-defined pruning rate, and the next training step is resumed. Note that to guarantee the capacity of the pruned model, we still allow those pruned filters to be updated in the next training step. It is experimentally demonstrated that our proposed method outperforms the traditional weight-based magnitude pruning method largely on the CIFAR10 and CIFAR100 datasets using the ResNet series networks and the VGG16 network. On the ResNet56 network, there is only a 0.1% accuracy loss at 40% pruning using our pruning method on the CIFAR10 training set.

show abstract

“…Knowledge Distillation Layers: For the proposed method, we select the intermediate features from ResNets [45] and MobileNetV2 [46] Networks with the following spatial sizes [H, W ]: [56, 56], [28,28], [14,14] and [7,7], analyzing L = 4 levels of depth. We assume that both Teacher and Student architectures share the same spatial sizes (in Width and Height, not in Channel dimension) at some points in their architectures.…”

Section: B Implementation Detailsmentioning

confidence: 99%

Attention-based Knowledge Distillation in Multi-attention Tasks: The Impact of a DCT-driven Loss

López-Cifuentes¹,

Escudero-Viñolo²,

Bescós³

et al. 2022

Preprint

View full text Add to dashboard Cite

Knowledge Distillation (KD) is a strategy for the definition of a set of transferability gangways to improve the efficiency of Convolutional Neural Networks. Feature-based Knowledge Distillation is a subfield of KD that relies on intermediate network representations, either unaltered or depth-reduced via maximum activation maps, as the source knowledge. In this paper, we propose and analyse the use of a 2D frequency transform of the activation maps before transferring them. We pose that-by using global image cues rather than pixel estimates, this strategy enhances knowledge transferability in tasks such as scene recognition, defined by strong spatial and contextual relationships between multiple and varied concepts. To validate the proposed method, an extensive evaluation of the state-ofthe-art in scene recognition is presented. Experimental results provide strong evidences that the proposed strategy enables the student network to better focus on the relevant image areas learnt by the teacher network, hence leading to better descriptive features and higher transferred performance than every other state-of-the-art alternative. We publicly release the training and evaluation framework used along this paper at: http://www-vpu. eps.uam.es/publications/DCTBasedKDForSceneRecognition.

show abstract

Online Filter Clustering and Pruning for Efficient Convnets

Cited by 20 publications

References 26 publications

Filter Pruning Without Damaging Networks Capacity

Filter Pruning Without Damaging Networks Capacity

An information entropy-based filter pruning method for efficient ConvNets

Attention-based Knowledge Distillation in Multi-attention Tasks: The Impact of a DCT-driven Loss

Contact Info

Product

Resources

About