Knowledge from the original network: restore a better pruned network with knowledge distillation

Chen, Liyang; Chen, Yong Q.; Xi, Juntong; Le, Xinyi

doi:10.1007/s40747-020-00248-y

Cited by 21 publications

(15 citation statements)

References 19 publications

(19 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The fundamental idea behind model compression is to create a sparse network eliminating unwanted connections and weights. Various research on model compression uses weight pruning and quantization [ 1 – 3 ], low-rank factorization [ 4 – 6 ], and knowledge distillation [ 7 – 10 ]. Typically, quantization and low-rank factorization approaches are applied to pretrained models; however, knowledge distillation methods are suited only for training from scratch.…”

Section: Introductionmentioning

confidence: 99%

DeepCompNet: A Novel Neural Net Model Compression Architecture

Rani

Chitra

Lakshmanan

et al. 2022

Computational Intelligence and Neuroscience

View full text Add to dashboard Cite

The emergence of powerful deep learning architectures has resulted in breakthrough innovations in several fields such as healthcare, precision farming, banking, education, and much more. Despite the advantages, there are limitations in deploying deep learning models in resource-constrained devices due to their huge memory size. This research work reports an innovative hybrid compression pipeline for compressing neural networks exploiting the untapped potential of z-score in weight pruning, followed by quantization using DBSCAN clustering and Huffman encoding. The proposed model has been experimented with state-of-the-art LeNet Deep Neural Network architectures using the standard MNIST and CIFAR datasets. Experimental results prove the compression performance of DeepCompNet by 26x without compromising the accuracy. The synergistic blend of the compression algorithms in the proposed model will ensure effortless deployment of neural networks leveraging DL applications in memory-constrained devices.

show abstract

Section: Introductionmentioning

confidence: 99%

DeepCompNet: A Novel Neural Net Model Compression Architecture

Rani

Chitra

Lakshmanan

et al. 2022

Computational Intelligence and Neuroscience

View full text Add to dashboard Cite

show abstract

“…With the continuous development of this technology, better knowledge distillation methods will emerge in an endless stream. Moreover, Chen L et al [24] noted that different knowledge distillation methods suited other neural network structures. Therefore, further research is needed to promote a deeper integration of retraining and knowledge distillation.…”

Section: Discussionmentioning

confidence: 99%

“…Chen L et al [24] also put forward the idea of using knowledge distillation. Compared with theirs, the Combine-Net algorithm is based on the sub-net after structural pruning, which has more strong universality and does not need exceptional hardware support.…”

Section: Retraining Methodsmentioning

confidence: 99%

Combine-Net: An Improved Filter Pruning Algorithm

Wang

Zhang

2021

Information

View full text Add to dashboard Cite

The powerful performance of deep learning is evident to all. With the deepening of research, neural networks have become more complex and not easily generalized to resource-constrained devices. The emergence of a series of model compression algorithms makes artificial intelligence on edge possible. Among them, structured model pruning is widely utilized because of its versatility. Structured pruning prunes the neural network itself and discards some relatively unimportant structures to compress the model’s size. However, in the previous pruning work, problems such as evaluation errors of networks, empirical determination of pruning rate, and low retraining efficiency remain. Therefore, we propose an accurate, objective, and efficient pruning algorithm—Combine-Net, introducing Adaptive BN to eliminate evaluation errors, the Kneedle algorithm to determine the pruning rate objectively, and knowledge distillation to improve the efficiency of retraining. Results show that, without precision loss, Combine-Net achieves 95% parameter compression and 83% computation compression on VGG16 on CIFAR10, 71% of parameter compression and 41% computation compression on ResNet50 on CIFAR100. Experiments on different datasets and models have proved that Combine-Net can efficiently compress the neural network’s parameters and computation.

show abstract

“…However, although the existing KD frameworks have been proved beneficial in traditional machine learning problems such as classification and regression, applying it to recommendation is still challenging due to the data sparsity issue [15,18]. Besides, recent studies find that models with similar structures (e.g., encoder-decoder) are easier to transfer knowledge [2], while the tensor decomposition may widen the structural gap between the teacher and the student because it can be seen as a special linear layer prior to student's embedding layer.…”

Section: Introductionmentioning

confidence: 99%

On-Device Next-Item Recommendation with Self-Supervised Knowledge Distillation

Xia¹,

Yin²,

Yu³

et al. 2022

Preprint

View full text Add to dashboard Cite

Modern recommender systems operate in a fully server-based fashion. To cater to millions of users, the frequent model maintaining and the high-speed processing for concurrent user requests are required, which comes at the cost of a huge carbon footprint. Meanwhile, users need to upload their behavior data even including the immediate environmental context to the server, raising the public concern about privacy. On-device recommender systems circumvent these two issues with cost-conscious settings and local inference. However, due to the limited memory and computing resources, on-device recommender systems are confronted with two fundamental challenges: (1) how to reduce the size of regular models to fit edge devices? (2) how to retain the original capacity?Previous research mostly adopts tensor decomposition techniques to compress the regular recommendation model with limited compression ratio so as to avoid drastic performance degradation. In this paper, we explore ultra-compact models for next-item recommendation, by loosing the constraint of dimensionality consistency in tensor decomposition. Meanwhile, to compensate for the capacity loss caused by compression, we develop a self-supervised knowledge distillation framework which enables the compressed model (student) to distill the essential information lying in the raw data, and improves the long-tail item recommendation through an embedding-recombination strategy with the original model (teacher). The extensive experiments on two benchmarks demonstrate that, with 30x model size reduction, the compressed model almost comes with no accuracy loss, and even outperforms its uncompressed counterpart in most cases. CCS CONCEPTS• Information systems → Recommender systems.

show abstract

Knowledge from the original network: restore a better pruned network with knowledge distillation

Cited by 21 publications

References 19 publications

DeepCompNet: A Novel Neural Net Model Compression Architecture

DeepCompNet: A Novel Neural Net Model Compression Architecture

Combine-Net: An Improved Filter Pruning Algorithm

On-Device Next-Item Recommendation with Self-Supervised Knowledge Distillation

Contact Info

Product

Resources

About