2022
DOI: 10.1109/tcyb.2020.3007506
|View full text |Cite
|
Sign up to set email alerts
|

Highlight Every Step: Knowledge Distillation via Collaborative Teaching

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
17
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
3
3
3

Relationship

1
8

Authors

Journals

citations
Cited by 47 publications
(17 citation statements)
references
References 52 publications
0
17
0
Order By: Relevance
“…Inspired by the idea of probabilistic knowledge transfer method ( 17 ), which considered knowledge distillation as a metric learning problem, we proposed a CNN method that performed standard echocardiographic view recognition through knowledge distillation. Knowledge distillation enables the student network learn the generalization ability of the teacher network, through replacing the hard original one-hot label with the soft label, and learn the ability to distinguish similar features ( 17 20 ). Therefore, on the one hand, the knowledge distillation method compresses the model, on the other hand, it enhances the generalization ability of the model.…”
Section: Methodsmentioning
confidence: 99%
“…Inspired by the idea of probabilistic knowledge transfer method ( 17 ), which considered knowledge distillation as a metric learning problem, we proposed a CNN method that performed standard echocardiographic view recognition through knowledge distillation. Knowledge distillation enables the student network learn the generalization ability of the teacher network, through replacing the hard original one-hot label with the soft label, and learn the ability to distinguish similar features ( 17 20 ). Therefore, on the one hand, the knowledge distillation method compresses the model, on the other hand, it enhances the generalization ability of the model.…”
Section: Methodsmentioning
confidence: 99%
“…SP [28] preserves the pairwise similarities in student's representation space instead to mimic the representation space of the teacher. CTKD [40] combines the knowledge from different teacher models to improve the student's performance in KD. Due to excellent performance, knowledge distillation has been used to solve a variety of complex applications such as object detection [41], [42], semantic segmentation [43], lane detection [44], face recognition [45]- [47] and action recognition [48].…”
Section: A Data-driven Knowledge Distillationmentioning
confidence: 99%
“…Park et al [11] point out that KD only considers knowledge transfer on individual samples and thus they propose to transfer mutual relations of data examples from a teacher to a student by penalizing logit-based structural differences between them. Zhao et al [12] consider the information in training process for knowledge distillation by employing two teachers. One teacher uses its temporary output logits during the training process to supervise the student step by step, which assists the student to find the optimal path towards the final logits.…”
Section: A Logit-based Distillation Approachesmentioning
confidence: 99%
“…Generate x augi batch and x augj batch by using data augmentation (e.g., randomly flipping, padding, and cropping) With xbatch , minimize (12) with one gradient descent step for f S 7: end for To optimize (12), we draw points from linear regions (i.e., P lr ) by linear algebra. As shown in Figure 5, each data sample (e.g., x aug i and x aug j ) can be considered as a high-dimensional vector:…”
Section: Locally Linear Region Knowledge Distillationmentioning
confidence: 99%