Interactive Knowledge Distillation for image classification

Fu, Shipeng; Li, Zhen; Liu, Zitao; Yang, Xiaobo

doi:10.1016/j.neucom.2021.04.026

Cited by 22 publications

(6 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Knowledge distillation is another method, as used in [33], [34], which is moving knowledge from a large, complicated DNN (teacher network) model to a smaller, more straightforward DNN (student network). Although distillation has increased accuracy [35], there is a chance that information will be lost in the transfer, and training the huge model will cost in terms of computation. An alternative approach is applying transfer learning (TL) [2], [36], which utilizes model weights from previously trained models.…”

Section: Related Workmentioning

confidence: 99%

Baseline model for deep neural networks in resource-constrained environments: an empirical investigation

Careem,

Gapar Md Johar,

Khatibi

2024

IJCDS

View full text Add to dashboard Cite

This paper presents an empirical study on advanced Deep Neural Network (DNN) models, with a focus on identifying potential baseline models for efficient deployment in resource-constrained environments (RCE). The systematic evaluation encompasses ten state-of-the-art pre-trained DNN models: ResNet50, InceptionResNetV2, InceptionV3, MobileNet, MobileNetV2, EfficientNetB0, EfficientNetB1, EfficientNetB2, DenseNet121, and Xception, within the context of an RCE setting. Evaluation criteria, such as parameters (indicating model complexity), storage space (reflecting storage requirements), CPU usage time (for real-time applications), and accuracy (reflecting prediction truth), are considered through systematic experimental procedures. The results highlight MobileNet's excellent trade-off between accuracy and resource requirements, especially in terms of CPU and storage consumption, in experimental scenarios where image predictions are performed on an RCE device. Utilizing the identified baseline model, a new model, GRM-MobileNet, was developed by implementing compound scaling and global average pooling techniques. GRM-MobileNet exhibits a substantial reduction of 23.81% in parameters compared to MobileNet, leading to a model size that is 23.88% smaller. Moreover, GRM-MobileNet demonstrates a significant improvement in accuracy, achieving a remarkable gain of 28.12% over MobileNet. Although the enhancement in inference time for GRM-MobileNet compared to MobileNet is modest at 1.66%, the overall improvements underscore the effectiveness of the employed strategies in enhancing the model's performance. A future study will examine other model optimization strategies, including factorization and pruning, which ultimately lead to faster inference without compromising accuracy, in an effort to improve the efficiency of the GRM-MobileNet model and its inference time.

show abstract

Section: Related Workmentioning

confidence: 99%

Baseline model for deep neural networks in resource-constrained environments: an empirical investigation

Careem,

Gapar Md Johar,

Khatibi

2024

IJCDS

View full text Add to dashboard Cite

show abstract

“…Knowledge distillation has been shown to significantly improve the accuracy and performance of small and simple neural networks. For example, Fu et al [28] proposes a method of knowledge distillation called interactive knowledge distillation for training a light-weight student network under the guidance of a well-trained, large teacher network that outperformed a larger and more complex DNN on several image classification tasks.…”

Section: Knowledge Distillationmentioning

confidence: 99%

“…Knowledge distillation, as used in [26], [27] involves transferring knowledge from a large and complex DNN (teacher network) to a smaller and simpler DNN (student network). While distillation has improved accuracy [28], it may result in information loss during the transfer and require additional computational cost for training the large model. Dense connection, an optimization technique used in [29], [30] connects all layers in the DNN architecture directly to each other, facilitating efficient information flow to prevent loss.…”

Section: Introductionmentioning

confidence: 99%

Deep neural networks optimization for resource-constrained environments: techniques and models

Careem,

Md Johar,

Khatibi

2024

IJEECS

View full text Add to dashboard Cite

This paper aims to present a comprehensive review of advanced techniques and models with a specific focus on deep neural network (DNN) for resource-constrained environments (RCE). The paper contributes by highlighting the RCE devices, analyzing challenges, reviewing a broad range of optimization techniques and DNN models, and offering a comparative assessment. The findings provide potential optimization techniques and recommend a baseline model for future development. It encompasses a broad range of DNN optimization techniques, including network pruning, weight quantization, knowledge distillation, depthwise separable convolution, residual connections, factorization, dense connections, and compound scaling. Moreover, the review analyzes the established optimization models which utilizes the above optimization techniques. A comprehensive analysis is conducted for each technique and model, considering its specific attributes, usability, strengths, and limitations in the context of effective deployment in RCEs. The review also presents a comparative assessment of advanced DNN models’ deployment for image classification, employing key evaluation metrics such as accuracy and efficiency factors like memory and inference time. The article concludes with the finding that combining depthwise separable convolution, weight quantization, and pruning represents potential optimization techniques, while also recommending EfficientNetB1 as a baseline model for the future development of optimization models in RCE image classification.<p> </p>

show abstract

“…The strategy neglects to consider the previous knowledge as guidance. Most existing knowledge distillation methods transfer the prediction distribution as additional knowledge (Fu et al, 2021), while the feature relationship is not fully utilized during the knowledge distillation period. This paper proposes a novel few-shot learning algorithm based on the meta-learning and knowledge distillation strategy to improve the model's performance.…”

Section: Introductionmentioning

confidence: 99%

Enhancing the Generalization Performance of Few-Shot Image Classification with Self-Knowledge Distillation

Li¹,

JIN²,

HUANG³

et al. 2022

STUD INFORM CONTROL

View full text Add to dashboard Cite

Though deep learning has succeeded in various fields, its performance on tasks without a large-scale dataset is always unsatisfactory. The meta-learning based few-shot learning has been used to address the limited data situation. Because of its fast adaptation to the new concepts, meta-learning fully utilizes the prior transferrable knowledge to recognize the unseen instances. The general belief is that meta-learning leverages a large quantity of few-shot tasks sampled from the base dataset to quickly adapt the learner to an unseen task. In this paper, the teacher model is distilled to transfer the features using the same architecture. Following the standard-setting in few-shot learning, the proposed model was trained from scratch and the distribution was transferred to a better generalization. Feature similarity matching was proposed to compensate for the inner feature similarities. Besides, the prediction from the teacher model was further corrected in the self-knowledge distillation period. The proposed approach was evaluated on several commonly used benchmarks in few-shot learning and performed best among all prior works.

show abstract

Interactive Knowledge Distillation for image classification

Cited by 22 publications

References 12 publications

Baseline model for deep neural networks in resource-constrained environments: an empirical investigation

Baseline model for deep neural networks in resource-constrained environments: an empirical investigation

Deep neural networks optimization for resource-constrained environments: techniques and models

Enhancing the Generalization Performance of Few-Shot Image Classification with Self-Knowledge Distillation

Contact Info

Product

Resources

About