2023
DOI: 10.1609/aaai.v37i2.25236
|View full text |Cite
|
Sign up to set email alerts
|

Curriculum Temperature for Knowledge Distillation

Abstract: Most existing distillation methods ignore the flexible role of the temperature in the loss function and fix it as a hyper-parameter that can be decided by an inefficient grid search. In general, the temperature controls the discrepancy between two distributions and can faithfully determine the difficulty level of the distillation task. Keeping a constant temperature, i.e., a fixed level of task difficulty, is usually sub-optimal for a growing student during its progressive learning stages. In this paper, we pr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
1
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 30 publications
(3 citation statements)
references
References 34 publications
(50 reference statements)
0
1
0
Order By: Relevance
“…Refer to Algorithm 2 for details. In [38], [54], it was experimentally proven that the value of α has little influence on the final classification performance. Therefore, they fixed α to 1 in different datasets, while varying the value of β.…”
Section: E Algorithm Implementation Processmentioning
confidence: 99%
“…Refer to Algorithm 2 for details. In [38], [54], it was experimentally proven that the value of α has little influence on the final classification performance. Therefore, they fixed α to 1 in different datasets, while varying the value of β.…”
Section: E Algorithm Implementation Processmentioning
confidence: 99%
“…WSLD [32] introduced a weighted soft-label approach and assigned a dynamic weight to the distillation loss based on the student's and teacher's learning on the supervised task. CTKD [33] proposed a dynamic temperature hyperparameter distillation framework. This framework increases distillation loss by adjusting the temperature adversarially, allowing the student to conduct knowledge transfer from easy to complex.…”
Section: Active Knowledge Distillationmentioning
confidence: 99%
“…These weighted base classifiers are then integrated to generate a robust classifier. Utilizing AdaBoost regulations during deep neural network training has been shown to improve the representation power of network models [25][26][27][28][29][30][31][32][33]. For instance, Taherkhani et al [27] proposed AdaBoost-CNN by combining AdaBoost with a convolutional neural network (CNN), successfully addressing the multi-class imbalanced sample classification issue.…”
Section: Introductionmentioning
confidence: 99%