2021
DOI: 10.1007/s11263-021-01453-z
|View full text |Cite
|
Sign up to set email alerts
|

Knowledge Distillation: A Survey

Abstract: In recent years, deep neural networks have been successful in both industry and academia, especially for computer vision tasks. The great success of deep learning is mainly due to its scalability to encode large-scale data and to maneuver billions of model parameters. However, it is a challenge to deploy these cumbersome deep models on devices with limited resources, e.g., mobile phones and embedded devices, not only because of the high computational complexity but also the large storage requirements. To this … Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

3
473
0
6

Year Published

2021
2021
2022
2022

Publication Types

Select...
7
2

Relationship

0
9

Authors

Journals

citations
Cited by 1,325 publications
(482 citation statements)
references
References 261 publications
3
473
0
6
Order By: Relevance
“…proposed TinyBERT, which aligns the hidden states and the attention heatmaps between student and teacher models. These methods usually learn the student model from a single teacher model (Gou et al, 2020). However, the knowledge and supervision provided by a single teacher model may be insufficient to learn an accurate student model, and the student model may also inherit the bias in the teacher model (Bhardwaj et al, 2020).…”
Section: Introductionmentioning
confidence: 99%
“…proposed TinyBERT, which aligns the hidden states and the attention heatmaps between student and teacher models. These methods usually learn the student model from a single teacher model (Gou et al, 2020). However, the knowledge and supervision provided by a single teacher model may be insufficient to learn an accurate student model, and the student model may also inherit the bias in the teacher model (Bhardwaj et al, 2020).…”
Section: Introductionmentioning
confidence: 99%
“…The focus is about the generator, but it would be interesting if the same mechanism can help to improve the discriminator. What is more, tuning the framework with data augmentation as a regularization [29] and knowledge distillation [30] is also interesting. We will follow this idea and explore it as part of the future work.…”
Section: Discussionmentioning
confidence: 99%
“…Since 2012, when AlexNet 13 won the 2012 ILSVRC competition, 14 numerous important breakthroughs in computer vision have been achieved using DCNNs 15‐20 . Benefit from the development of DCNNs, continuous optimization of object detection algorithms in natural images, and the release of open‐source medical image datasets, the studies on object detection in medical images have made significant progress.…”
Section: Related Workmentioning
confidence: 99%