Highlight Every Step: Knowledge Distillation via Collaborative Teaching

Zhao, Haoran; Sun, Xin; Dong, Junyu; Chen, Changrui; Dong, Zihe

doi:10.1109/tcyb.2020.3007506

Cited by 47 publications

(17 citation statements)

References 52 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Inspired by the idea of probabilistic knowledge transfer method ( 17 ), which considered knowledge distillation as a metric learning problem, we proposed a CNN method that performed standard echocardiographic view recognition through knowledge distillation. Knowledge distillation enables the student network learn the generalization ability of the teacher network, through replacing the hard original one-hot label with the soft label, and learn the ability to distinguish similar features ( 17 – 20 ). Therefore, on the one hand, the knowledge distillation method compresses the model, on the other hand, it enhances the generalization ability of the model.…”

Section: Methodsmentioning

confidence: 99%

Standard Echocardiographic View Recognition in Diagnosis of Congenital Heart Defects in Children Using Deep Learning Based on Knowledge Distillation

Dong²,

Liu

et al. 2022

Front. Pediatr.

View full text Add to dashboard Cite

Standard echocardiographic view recognition is a prerequisite for automatic diagnosis of congenital heart defects (CHDs). This study aims to evaluate the feasibility and accuracy of standard echocardiographic view recognition in the diagnosis of CHDs in children using convolutional neural networks (CNNs). A new deep learning-based neural network method was proposed to automatically and efficiently identify commonly used standard echocardiographic views. A total of 367,571 echocardiographic image slices from 3,772 subjects were used to train and validate the proposed echocardiographic view recognition model where 23 standard echocardiographic views commonly used to diagnose CHDs in children were identified. The F1 scores of a majority of views were all ≥0.90, including subcostal sagittal/coronal view of the atrium septum, apical four-chamber view, apical five-chamber view, low parasternal four-chamber view, sax-mid, sax-basal, parasternal long-axis view of the left ventricle (PSLV), suprasternal long-axis view of the entire aortic arch, M-mode echocardiographic recording of the aortic (M-AO) and the left ventricle at the level of the papillary muscle (M-LV), Doppler recording from the mitral valve (DP-MV), the tricuspid valve (DP-TV), the ascending aorta (DP-AAO), the pulmonary valve (DP-PV), and the descending aorta (DP-DAO). This study provides a solid foundation for the subsequent use of artificial intelligence (AI) to identify CHDs in children.

show abstract

Section: Methodsmentioning

confidence: 99%

Standard Echocardiographic View Recognition in Diagnosis of Congenital Heart Defects in Children Using Deep Learning Based on Knowledge Distillation

Dong²,

Liu

et al. 2022

Front. Pediatr.

View full text Add to dashboard Cite

show abstract

“…SP [28] preserves the pairwise similarities in student's representation space instead to mimic the representation space of the teacher. CTKD [40] combines the knowledge from different teacher models to improve the student's performance in KD. Due to excellent performance, knowledge distillation has been used to solve a variety of complex applications such as object detection [41], [42], semantic segmentation [43], lane detection [44], face recognition [45]- [47] and action recognition [48].…”

Section: A Data-driven Knowledge Distillationmentioning

confidence: 99%

Dual Discriminator Adversarial Distillation for Data-free Model Compression

Zhao¹,

Sun²,

Dong³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

Knowledge distillation has been widely used to produce portable and efficient neural networks which can be well applied on edge devices for computer vision tasks. However, almost all top-performing knowledge distillation methods need to access the original training data, which usually has a huge size and is often unavailable. To tackle this problem, we propose a novel data-free approach in this paper, named Dual Discriminator Adversarial Distillation (DDAD) to distill a neural network without any training data or meta-data. To be specific, we use a generator to create samples through dual discriminator adversarial distillation, which mimics the original training data. The generator not only uses the pre-trained teacher's intrinsic statistics in existing batch normalization layers but also obtains the maximum discrepancy from the student model. Then the generated samples are used to train the compact student network under the supervision of the teacher. The proposed method obtains an efficient student network which closely approximates its teacher network, despite using no original training data. Extensive experiments are conducted to to demonstrate the effectiveness of the proposed approach on CIFAR-10, CIFAR-100 and Caltech101 datasets for classification tasks. Moreover, we extend our method to semantic segmentation tasks on several public datasets such as CamVid and NYUv2. All experiments show that our method outperforms all baselines for data-free knowledge distillation.

show abstract

“…Park et al [11] point out that KD only considers knowledge transfer on individual samples and thus they propose to transfer mutual relations of data examples from a teacher to a student by penalizing logit-based structural differences between them. Zhao et al [12] consider the information in training process for knowledge distillation by employing two teachers. One teacher uses its temporary output logits during the training process to supervise the student step by step, which assists the student to find the optimal path towards the final logits.…”

Section: A Logit-based Distillation Approachesmentioning

confidence: 99%

“…Generate x augi batch and x augj batch by using data augmentation (e.g., randomly flipping, padding, and cropping) With xbatch , minimize (12) with one gradient descent step for f S 7: end for To optimize (12), we draw points from linear regions (i.e., P lr ) by linear algebra. As shown in Figure 5, each data sample (e.g., x aug i and x aug j ) can be considered as a high-dimensional vector:…”

Section: Locally Linear Region Knowledge Distillationmentioning

confidence: 99%

Locally Linear Region Knowledge Distillation

Deng,

Zhongfei

2020

Preprint

View full text Add to dashboard Cite

Knowledge distillation (KD) is an effective technique to transfer knowledge from one neural network (teacher) to another (student), thus improving the performance of the student. To make the student better mimic the behavior of the teacher, the existing work focuses on designing different criteria to align their logits or representations. Different from these efforts, we address knowledge distillation from a novel data perspective. We argue that transferring knowledge at sparse training data points cannot enable the student to well capture the local shape of the teacher function. To address this issue, we propose locally linear region knowledge distillation (L 2 RKD) which transfers the knowledge in local, linear regions from a teacher to a student. This is achieved by enforcing the student to mimic the outputs of the teacher function in local, linear regions. To the end, the student is able to better capture the local shape of the teacher function and thus achieves a better performance. Despite its simplicity, extensive experiments demonstrate that L 2 RKD is superior to the original KD in many aspects as it outperforms KD and the other state-of-the-art approaches by a large margin, shows robustness and superiority under few-shot settings, and is more compatible with the existing distillation approaches to further improve their performances significantly.

show abstract

Highlight Every Step: Knowledge Distillation via Collaborative Teaching

Cited by 47 publications

References 52 publications

Standard Echocardiographic View Recognition in Diagnosis of Congenital Heart Defects in Children Using Deep Learning Based on Knowledge Distillation

Standard Echocardiographic View Recognition in Diagnosis of Congenital Heart Defects in Children Using Deep Learning Based on Knowledge Distillation

Dual Discriminator Adversarial Distillation for Data-free Model Compression

Locally Linear Region Knowledge Distillation

Contact Info

Product

Resources

About