Play and Prune: Adaptive Filter Pruning for Deep Model Compression

Singh, Pravendra; Verma, Vinay Kumar; Rai, Piyush; Namboodiri, Vinay P.

doi:10.24963/ijcai.2019/480

Cited by 54 publications

(26 citation statements)

References 2 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Network slimming [14] imposes sparsity regularization on the scaling factors in batch normalization layers during training, so as to identify and prune insignificant channels. Unlike these previous approaches, Play and Prune [15] allows to specify the error tolerance limit instead of the pruning ratio for each layer. Wang et al [16] verify that pruning from randomly initialized weights directly can result in more diverse pruned structures with competitive performance.…”

Section: B Dnn Model Compressionmentioning

confidence: 99%

“…Our choice is based on the following three reasons: (1) Other than weight pruning [7], channel pruning produces hardware-friendly models without introducing irregular sparsity [11]. (2) 1 -norm can be easily calculated for measuring the importance of filters [13], while most pruning criteria [14], [15] can only be obtained in the formal training process. (3) Unlike [7], only oneshot pruning is adopted by FL-PQSU, as the further pruning in federated training incurs additional overhead, but contributes little to performance improvement [13].…”

Section: Structured Pruningmentioning

confidence: 99%

See 1 more Smart Citation

Accelerating Federated Learning for IoT in Big Data Analytics With Pruning, Quantization and Selective Updating

Fang

Ding

et al. 2021

IEEE Access

View full text Add to dashboard Cite

The ever-increasing number of Internet of Things (IoT) devices are continuously generating huge masses of data, but the current cloud-centric approach for IoT big data analysis has raised public concerns on both data privacy and network cost. Federated learning (FL) recently emerges as a promising technique to accommodate these concerns, by means of learning a global model by aggregating local updates from multiple devices without sharing the privacy-sensitive data. However, IoT devices usually have constrained computation resources and poor network connections, making it infeasible or very slow to train deep neural networks (DNNs) by following the FL pattern. To address this problem, we propose a new efficient FL framework called FL-PQSU in this paper. It is composed of 3-stage pipeline: structured pruning, weight quantization and selective updating, that work together to reduce the costs of computation, storage, and communication to accelerate the FL training process. We study FL-PQSU using popular DNN models (AlexNet, VGG16) and publicly available datasets (MNIST, CIFAR10), and demonstrate that it can well control the training overhead while still guaranteeing the learning performance.INDEX TERMS Federated learning, Internet of Things, big data, model compression, network pruning.

show abstract

Section: B Dnn Model Compressionmentioning

confidence: 99%

Section: Structured Pruningmentioning

confidence: 99%

Accelerating Federated Learning for IoT in Big Data Analytics With Pruning, Quantization and Selective Updating

Fang

Ding

et al. 2021

IEEE Access

View full text Add to dashboard Cite

show abstract

“…Taylor [32] feature maps: |∆C(Hi,j )| = δC δHi,j Hi,j FPGM [57] weights: Wi,j * ∈ argmin j * ∈ n in ×k * ×k j ∈ [1,nout] ||x − W i,j ||2 Prune iteratively with regularization Play and Prune [56] weights: Sj = |wk| Auto-Balance [55] weights: Sj = |wk| Prune iteratively, min reconstruction error ThiNet feature maps:…”

Section: Strategymentioning

confidence: 99%

“…"Prune iteratively" is a type of pruning that is done iteratively on a trained model that alternate between pruning and fine-tuning [32]. Pruning by regularization is usually done by adding a regularization term to the original loss function in order to leave the pruning process for the optimization [55,56]. Pruning by minimizing the reconstruction error is a family of algorithms that tries to minimize the difference of outputs between the pruned and the original model.…”

Section: Strategymentioning

confidence: 99%

“…The algorithm of FPGM is summarized in the Algorithm 2. Play And Prune [56] is an adaptive output channel pruning technique, that, instead of focusing on a criterion, tries to find an optimal number of output channels that can be pruned away given an error tolerance rate. This technique is min-max game of two modules, The Adaptive Filter Pruning (AFP) module and the Pruning Algorithm 2 Algorithm Description of FPGM 1: Input: training data: X 2: Input: pruning rate: P 3: Input: the model with parameters M = {M (i) ,0 ≤ i ≤ L} 4:…”

Section: Criteria Based On Weightsmentioning

confidence: 99%

See 1 more Smart Citation

Exploiting Prunability for Person Re-identification

Masson

Bhuiyan

Nguyen-Meidine

et al. 2020

Preprint

View full text Add to dashboard Cite

Recent years have witnessed a substantial increase in the deep learning (DL) architectures proposed for visual recognition tasks like person re-identification, where individuals must be recognized over multiple distributed cameras. Although these architectures have greatly improved the state-of-the-art accuracy, the computational complexity of the CNNs commonly used for feature extraction remains an issue, hindering their deployment on platforms with limited resources, or in applications with real-time constraints. There is an obvious advantage to accelerating and compressing DL models without significantly decreasing their accuracy. However, the source (pruning) domain differs from operational (target) domains, and the domain shift between image data captured with different non-overlapping camera viewpoints leads to lower recognition accuracy. In this paper, we investigate the prunability of these architectures under different design scenarios. This paper first revisits pruning techniques that are suitable for reducing the computational complexity of deep CNN networks applied to person re-identification. Then, these techniques are analysed according to their pruning criteria and strategy, and according to different scenarios for exploiting pruningmethods to ne-tuning networks to target domains. Experimental results obtained using DL models with ResNet feature extractors, and multiple benchmarks re-identification datasets, indicate that pruning can considerably reduce network complexity while maintaining a high level of accuracy. In scenarios where pruning is performed with large pre-training or ne-tuning datasets, the number of FLOPS required by ResNet architectures is reduced by half, while maintaining a comparable rank-1 accuracy (within 1% of the original model). Pruning while training a larger CNNs can also provide a significantly better performance than ne-tuning smaller ones.

show abstract

LPLB: An approach for the design of a lightweight convolutional neural network

Saleem

Chishti

2022

Concurrency and Computation

View full text Add to dashboard Cite

SUMMARY Convolutional neural network (CNN) is one of the widely used deep neural network architecture for data analytics in the Internet of Things (IoT). However, due to its severe resource requirements, deploying CNN on resource‐constrained edge devices is quite challenging. Moreover, IoT services demand fast data analytics in order to be useful in their context. Hence, ensuring the deployment of CNN models on IoT edge devices is crucial. To this purpose, this article proposes a framework LPLB (less parameters and less bits) for the design of a lightweight CNN with reduced number of parameters and lesser storage requirements while preserving the model accuracy. LPLB consists of three steps: CNN training, iterative parameter drop and post‐training quantization. The performance of LPLB is evaluated on three datasets: German traffic sign recognition dataset, Kaggle hand gesture recognition dataset, and CIFAR‐10 dataset. Experiments conducted in the study reveal that the proposed framework is able to reduce the number of parameters by 11× in case of German traffic sign recognition dataset, 12.5× in case of Kaggle hand gesture recognition dataset, and 14× in case of CIFAR‐10 dataset. Moreover, the storage requirements are reduced by 44.42× in case of German traffic sign recognition dataset, 50× in case of Kaggle hand gesture recognition dataset, and 56.7× in case of CIFAR‐10 dataset. This is achieved without a significant drop in the accuracy. The accuracy drop in case of German traffic sign recognition dataset is 0.215%, 0.18% in case of Kaggle hand gesture recognition dataset, and 0.148% in case of CIFAR‐10 dataset. The proposed framework is generic and can be used for the design of lightweight CNN models corresponding to other use‐cases as well. This will make real‐time classification/recognition possible in delay‐sensitive applications like self‐driving cars, advanced driver assistance systems, health monitoring, elderly posture recognition and so forth.

show abstract

Play and Prune: Adaptive Filter Pruning for Deep Model Compression

Cited by 54 publications

References 2 publications

Accelerating Federated Learning for IoT in Big Data Analytics With Pruning, Quantization and Selective Updating

Accelerating Federated Learning for IoT in Big Data Analytics With Pruning, Quantization and Selective Updating

Exploiting Prunability for Person Re-identification

LPLB: An approach for the design of a lightweight convolutional neural network

Contact Info

Product

Resources

About