A closer look at the training dynamics of knowledge distillation

Miles, Roy; Mikolajczyk, Krystian

doi:10.48550/arxiv.2303.11098

Cited by 2 publications

(2 citation statements)

References 38 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…1 According to Hugging Face Model repositories, the BERT models fine-tuned for the GLUE tasks have already been downloaded about 138,000 times in total at the time of writing. Research communities leverage torchdistill not only for knowledge distillation studies Li et al, 2022a;Lin et al, 2022;Dong et al, 2022;Miles and Mikolajczyk, 2023), but also for machine learning reproducibility challenge (MLRC) (Lee and Lee, 2023) and reproducible deep learning studies (Matsubara et al, 2022a,c;Furutanpey et al, 2023b,a;. torchdistill is publicly available as a pip-installable PyPI package and will be maintained and upgraded for encouraging coding-free reproducible deep learning and knowledge distillation studies.…”

Section: Discussionmentioning

confidence: 99%

torchdistill: A Modular, Configuration-Driven Framework for Knowledge Distillation

Matsubara¹

2021

Reproducible Research in Pattern Recognition

View full text Add to dashboard Cite

Reproducibility in scientific work has been becoming increasingly important in research communities such as machine learning, natural language processing, and computer vision communities due to the rapid development of the research domains supported by recent advances in deep learning. In this work, we present a significantly upgraded version of torchdistill 1 , a modular-driven coding-free deep learning framework significantly upgraded from the initial release, which supports only image classification and object detection tasks for reproducible knowledge distillation experiments. To demonstrate that the upgraded framework can support more tasks with third-party libraries, we reproduce the GLUE benchmark results of BERT models using a script based on the upgraded torchdistill, harmonizing with various Hugging Face libraries. All the 27 fine-tuned BERT models and configurations to reproduce the results are published at Hugging Face 2 , and the model weights have already been widely used in research communities. We also reimplement popular small-sized models and new knowledge distillation methods and perform additional experiments for computer vision tasks.

show abstract

Section: Discussionmentioning

confidence: 99%

torchdistill: A Modular, Configuration-Driven Framework for Knowledge Distillation

Matsubara¹

2021

Reproducible Research in Pattern Recognition

View full text Add to dashboard Cite

show abstract

“…Deep Neural Networks progressively generate features [36][37][38], with higher layers capturing more closely related critical features necessary for the main task. Considering the training process of a DNN as the problem and its learned weights and parameters as the solution, the features generated within the depths of the DNN can be viewed as temporary results during the solving process.…”

Section: Flow Of Solution Procedures (Fsp) Matrixmentioning

confidence: 99%

Forest Fire Object Detection Analysis Based on Knowledge Distillation

Xie,

Zhao

2023

Fire

View full text Add to dashboard Cite

This paper investigates the application of the YOLOv7 object detection model combined with knowledge distillation techniques in forest fire detection. As an advanced object detection model, YOLOv7 boasts efficient real-time detection capabilities. However, its performance may be constrained in resource-limited environments. To address this challenge, this research proposes a novel approach: considering that deep neural networks undergo multi-layer mapping from the input to the output space, we define the knowledge propagation between layers by evaluating the dot product of features extracted from two different layers. To this end, we utilize the Flow of Solution Procedure (FSP) matrix based on the Gram matrix and redesign the distillation loss using the Pearson correlation coefficient, presenting a new knowledge distillation method termed ILKDG (Intermediate Layer Knowledge Distillation with Gram Matrix-based Feature Flow). Compared with the classical knowledge distillation algorithm, KD, ILKDG achieved a significant performance improvement on a self-created forest fire detection dataset. Specifically, without altering the student network’s parameters or network layers, mAP@0.5 improved by 2.9%, and mAP@0.5:0.95 increased by 2.7%. These results indicate that the proposed ILKDG method effectively enhances the accuracy and performance of forest fire detection without introducing additional parameters. The ILKDG method, based on the Gram matrix and Pearson correlation coefficient, presents a novel knowledge distillation approach, providing a fresh avenue for future research. Researchers can further optimize and refine this method to achieve superior results in fire detection.

show abstract

A closer look at the training dynamics of knowledge distillation

Cited by 2 publications

References 38 publications

torchdistill: A Modular, Configuration-Driven Framework for Knowledge Distillation

torchdistill: A Modular, Configuration-Driven Framework for Knowledge Distillation

Forest Fire Object Detection Analysis Based on Knowledge Distillation

Contact Info

Product

Resources

About