2021
DOI: 10.1609/aaai.v35i3.16356
|View full text |Cite
|
Sign up to set email alerts
|

Progressive Network Grafting for Few-Shot Knowledge Distillation

Abstract: Knowledge distillation has demonstrated encouraging performances in deep model compression. Most existing approaches, however, require massive labeled data to accomplish the knowledge transfer, making the model compression a cumbersome and costly process. In this paper, we investigate the practical few-shot knowledge distillation scenario, where we assume only a few samples without human annotations are available for each category. To this end, we introduce a principled dual-stage distillation scheme tailored… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
12
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
3

Relationship

0
6

Authors

Journals

citations
Cited by 28 publications
(16 citation statements)
references
References 40 publications
(39 reference statements)
0
12
0
Order By: Relevance
“…Some studies have also improved layer-wisely distillation and proposed cross-distillation, which can effectively reduce the estimation error of layer-wisely distillation by cross-training the hidden layer network of teachers and students [8]. Besides the cross-distillation model, a principled dual-stage distillation scheme based on small samples has also been proposed, in which the student modules are grafted into the teacher network for training, then the trained student modules are spliced together and grafted into the teacher network, and finally the teacher network is replaced [9]. In some of the above methods, some will add additional convolutional layers to the compressed network during training, which increases the complexity of the network structure.…”
Section: Related Workmentioning
confidence: 99%
See 4 more Smart Citations
“…Some studies have also improved layer-wisely distillation and proposed cross-distillation, which can effectively reduce the estimation error of layer-wisely distillation by cross-training the hidden layer network of teachers and students [8]. Besides the cross-distillation model, a principled dual-stage distillation scheme based on small samples has also been proposed, in which the student modules are grafted into the teacher network for training, then the trained student modules are spliced together and grafted into the teacher network, and finally the teacher network is replaced [9]. In some of the above methods, some will add additional convolutional layers to the compressed network during training, which increases the complexity of the network structure.…”
Section: Related Workmentioning
confidence: 99%
“…D EEP neural networks are widely used in various computer vision tasks [1]- [4] and have achieved remarkable results [5], [6]. However, the current state-of-the-art deep models suffer from huge energy consumption, high operating and storage costs, which greatly hinder their deployment in resource-efficient situations [7]- [9]. To solve this problem, a lot of works have been proposed to compress neural networks for obtaining more lightweight neural network models.…”
Section: Introductionmentioning
confidence: 99%
See 3 more Smart Citations