2021
DOI: 10.1016/j.neucom.2020.10.113
|View full text |Cite
|
Sign up to set email alerts
|

Residual error based knowledge distillation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
18
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
6
3

Relationship

0
9

Authors

Journals

citations
Cited by 32 publications
(19 citation statements)
references
References 21 publications
(24 reference statements)
0
18
0
Order By: Relevance
“…The model capacity gap between the large deep neural network and a small student neural network can degrade knowledge transfer (Mirzadeh et al, 2020;Gao et al, 2021). To effectively transfer knowledge to student networks, a variety of methods have been proposed for a controlled reduction of the model complexity (Zhang et al, 2018b;Nowak and Corso, 2018;Crowley et al, 2018;Liu et al, 2019a,i;Wang et al, 2018a;Gu and Tresp, 2020).…”
Section: Teacher-student Architecturementioning
confidence: 99%
“…The model capacity gap between the large deep neural network and a small student neural network can degrade knowledge transfer (Mirzadeh et al, 2020;Gao et al, 2021). To effectively transfer knowledge to student networks, a variety of methods have been proposed for a controlled reduction of the model complexity (Zhang et al, 2018b;Nowak and Corso, 2018;Crowley et al, 2018;Liu et al, 2019a,i;Wang et al, 2018a;Gu and Tresp, 2020).…”
Section: Teacher-student Architecturementioning
confidence: 99%
“…However, some recent studies argued a different view. [27] and [10] thought that a large model capacity gap between teacher and student may have a negative effect on knowledge transfer, and introduced assistant networks to narrow the gap. [29] proposed to learn a student-friendly teacher by plugging in student branches during the training procedure.…”
Section: B Experience Ensemble Knowledge Distillationmentioning
confidence: 99%
“…Some researchers find that KD can lead the students to suboptimal converged performance when the accuracy gap between teacher and students is too large [15,31]. Moreover, [1,14] have shown that the very early training time is much essential for the network, meaning such a severe gap might damage the overall performance at the essential early phases.…”
Section: Analysis Of Trainingmentioning
confidence: 99%