2021
DOI: 10.1007/s40747-020-00248-y
|View full text |Cite
|
Sign up to set email alerts
|

Knowledge from the original network: restore a better pruned network with knowledge distillation

Abstract: To deploy deep neural networks to edge devices with limited computation and storage costs, model compression is necessary for the application of deep learning. Pruning, as a traditional way of model compression, seeks to reduce the parameters of model weights. However, when a deep neural network is pruned, the accuracy of the network will significantly decrease. The traditional way to decrease the accuracy loss is fine-tuning. When over many parameters are pruned, the pruned network’s capacity is reduced heavi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
15
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
2
2

Relationship

0
9

Authors

Journals

citations
Cited by 21 publications
(15 citation statements)
references
References 19 publications
(19 reference statements)
0
15
0
Order By: Relevance
“…The fundamental idea behind model compression is to create a sparse network eliminating unwanted connections and weights. Various research on model compression uses weight pruning and quantization [ 1 – 3 ], low-rank factorization [ 4 – 6 ], and knowledge distillation [ 7 – 10 ]. Typically, quantization and low-rank factorization approaches are applied to pretrained models; however, knowledge distillation methods are suited only for training from scratch.…”
Section: Introductionmentioning
confidence: 99%
“…The fundamental idea behind model compression is to create a sparse network eliminating unwanted connections and weights. Various research on model compression uses weight pruning and quantization [ 1 – 3 ], low-rank factorization [ 4 – 6 ], and knowledge distillation [ 7 – 10 ]. Typically, quantization and low-rank factorization approaches are applied to pretrained models; however, knowledge distillation methods are suited only for training from scratch.…”
Section: Introductionmentioning
confidence: 99%
“…With the continuous development of this technology, better knowledge distillation methods will emerge in an endless stream. Moreover, Chen L et al [24] noted that different knowledge distillation methods suited other neural network structures. Therefore, further research is needed to promote a deeper integration of retraining and knowledge distillation.…”
Section: Discussionmentioning
confidence: 99%
“…Chen L et al [24] also put forward the idea of using knowledge distillation. Compared with theirs, the Combine-Net algorithm is based on the sub-net after structural pruning, which has more strong universality and does not need exceptional hardware support.…”
Section: Retraining Methodsmentioning
confidence: 99%
“…However, although the existing KD frameworks have been proved beneficial in traditional machine learning problems such as classification and regression, applying it to recommendation is still challenging due to the data sparsity issue [15,18]. Besides, recent studies find that models with similar structures (e.g., encoder-decoder) are easier to transfer knowledge [2], while the tensor decomposition may widen the structural gap between the teacher and the student because it can be seen as a special linear layer prior to student's embedding layer.…”
Section: Introductionmentioning
confidence: 99%