Knowledge Distillation with Feature Maps for Image Classification

Chen, Wei‐Chun; Chang, Chia-Che; Lee, Che-Rung

doi:10.1007/978-3-030-20893-6_13

Cited by 22 publications

(16 citation statements)

References 12 publications

(25 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In last few years, a variety of knowledge distillation methods have been widely used for model compression in different visual recognition applications. Specifically, most of the knowledge distillation methods were previously developed for image classification (Li and Hoiem, 2017;Peng et al, 2019b;Bagherinezhad et al, 2018;Chen et al, 2018a;Wang et al, 2019b;Mukherjee et al, 2019;Zhu et al, 2019) and then extended to other visual recognition applications, including face recognition (Luo et al, 2016;Kong et al, 2019;Yan et al, 2019;Ge et al, 2018;Wang et al, 2018bWang et al, , 2019cDuong et al, 2019;Wu et al, 2020;Wang et al, 2017), action recognition (Hao and Zhang, 2019;Thoker and Gall, 2019;Luo et al, 2018;Garcia et al, 2018;Wu et al, 2019b;Zhang et al, 2020), object detection Hong and Yu, 2019;Shmelkov et al, 2017;Wei et al, 2018;Wang et al, 2019d), lane detection (Hou et al, 2019), image or video segmentation (He et al, 2019;Liu et al, 2019g;Mullapudi et al, 2019;Siam et al, 2019;Dou et al, 2020), video classification (Bhardwaj et al, 2019;Zhang and Peng, 2018), pedestrian detection (Shen et al, 2016), facial landmark detection (Dong and Yang, 2019), person re-identification (Wu et al, 2019a)…”

Section: Kd In Visual Recognitionmentioning

confidence: 99%

“…Recently, knowledge distillation has been used successfully for solving the complex image classification problems. In addition, there are existing typical methods (Li and Hoiem, 2017;Bagherinezhad et al, 2018;Peng et al, 2019b;Chen et al, 2018a;Zhu et al, 2019;Wang et al, 2019b;Mukherjee et al, 2019). For incomplete, ambiguous and redundant image labels, the label refinery model through self-distillation and label progression was proposed to learn soft, informative, collective and dynamic labels for complex image classification (Bagherinezhad et al, 2018).…”

Section: Kd In Visual Recognitionmentioning

confidence: 99%

See 1 more Smart Citation

Knowledge Distillation: A Survey

Gou,

Yu,

Maybank

et al. 2020

Preprint

View full text Add to dashboard Cite

In recent years, deep neural networks have been successful in both industry and academia, especially for computer vision tasks. The great success of deep learning is mainly due to its scalability to encode large-scale data and to maneuver billions of model parameters. However, it is a challenge to deploy these cumbersome deep models on devices with limited resources, e.g., mobile phones and embedded devices, not only because of the high computational complexity but also the large storage requirements. To this end, a variety of model compression and acceleration techniques have been developed. As a representative type of model compression and acceleration, knowledge distillation effectively learns a small student model from a large teacher model. It has received rapid increasing attention from the community. This paper provides a comprehensive survey of knowledge distillation from the perspectives of knowledge categories, training schemes, distillation algorithms and applications. Furthermore, challenges in knowledge distillation are briefly reviewed and comments on future research are discussed and forwarded.

show abstract

Section: Kd In Visual Recognitionmentioning

confidence: 99%

Section: Kd In Visual Recognitionmentioning

confidence: 99%

Knowledge Distillation: A Survey

Gou,

Yu,

Maybank

et al. 2020

Preprint

View full text Add to dashboard Cite

show abstract

“…Furthermore, we develop a free adversarial training variant of ARD and demonstrate appreciably accelerated performance. Recent work on distillation has produced significant improvements over vanilla knowledge distillation (Chen et al 2018). We believe that Knowledge Distillation with Feature Maps could improve both natural and robust accuracy of student networks.…”

Section: Discussionmentioning

confidence: 95%

Adversarially Robust Distillation

Goldblum

Fowl

Feizi

et al. 2020

AAAI

113

View full text Add to dashboard Cite

Knowledge distillation is effective for producing small, high-performance neural networks for classification, but these small networks are vulnerable to adversarial attacks. This paper studies how adversarial robustness transfers from teacher to student during knowledge distillation. We find that a large amount of robustness may be inherited by the student even when distilled on only clean images. Second, we introduce Adversarially Robust Distillation (ARD) for distilling robustness onto student networks. In addition to producing small models with high test accuracy like conventional distillation, ARD also passes the superior robustness of large networks onto the student. In our experiments, we find that ARD student models decisively outperform adversarially trained networks of identical architecture in terms of robust accuracy, surpassing state-of-the-art methods on standard robustness benchmarks. Finally, we adapt recent fast adversarial training methods to ARD for accelerated robust distillation.

show abstract

“…Knowledge distillation has also been used for adversarial attacks (Papernot et al, 2016b;Ross & Doshi-Velez, 2017;Gil et al, 2019;Goldblum et al, 2020), data security (Papernot et al, 2016a;Lopes et al, 2017;, image processing (Li & Hoiem, 2017;Wang et al, 2017;Chen et al, 2018;, natural language processing (Nakashole & Flauger, 2017;Mou et al, 2016;Hu et al, 2018;Freitag et al, 2017), and speech processing (Chebotar & Waters, 2016;Lu et al, 2017;Watanabe et al, 2017;Oord et al, 2018;Shen et al, 2018).…”

Section: A Extended Literature Reviewmentioning

confidence: 99%

Knowledge Distillation as Semiparametric Inference

Dao,

Kamath,

Syrgkanis

et al. 2021

Preprint

View full text Add to dashboard Cite

A popular approach to model compression is to train an inexpensive student model to mimic the class probabilities of a highly accurate but cumbersome teacher model. Surprisingly, this two-step knowledge distillation process often leads to higher accuracy than training the student directly on labeled data. To explain and enhance this phenomenon, we cast knowledge distillation as a semiparametric inference problem with the optimal student model as the target, the unknown Bayes class probabilities as nuisance, and the teacher probabilities as a plug-in nuisance estimate. By adapting modern semiparametric tools, we derive new guarantees for the prediction error of standard distillation and develop two enhancements-cross-fitting and loss correction-to mitigate the impact of teacher overfitting and underfitting on student performance. We validate our findings empirically on both tabular and image data and observe consistent improvements from our knowledge distillation enhancements.

show abstract

Knowledge Distillation with Feature Maps for Image Classification

Cited by 22 publications

References 12 publications

Knowledge Distillation: A Survey

Knowledge Distillation: A Survey

Adversarially Robust Distillation

Knowledge Distillation as Semiparametric Inference

Contact Info

Product

Resources

About