Knowledge Distillation: A Survey

Gou, Jianping; Yu, Baosheng; Maybank, Stephen J.; Tao, Dacheng

doi:10.1007/s11263-021-01453-z

Cited by 1,325 publications

(482 citation statements)

References 261 publications

Supporting

Mentioning

473

Contrasting

Unclassified

Order By: Relevance

“…proposed TinyBERT, which aligns the hidden states and the attention heatmaps between student and teacher models. These methods usually learn the student model from a single teacher model (Gou et al, 2020). However, the knowledge and supervision provided by a single teacher model may be insufficient to learn an accurate student model, and the student model may also inherit the bias in the teacher model (Bhardwaj et al, 2020).…”

Section: Introductionmentioning

confidence: 99%

One Teacher is Enough? Pre-trained Language Model Distillation from Multiple Teachers

Wu¹,

Wu²,

Huang³

2021

Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

View full text Add to dashboard Cite

Pre-trained language models (PLMs) achieve great success in NLP. However, their huge model sizes hinder their applications in many practical systems. Knowledge distillation is a popular technique to compress PLMs, which learns a small student model from a large teacher PLM. However, the knowledge learned from a single teacher may be limited and even biased, resulting in low-quality student model. In this paper, we propose a multi-teacher knowledge distillation framework named MT-BERT for pre-trained language model compression, which can train high-quality student model from multiple teacher PLMs. In MT-BERT we design a multi-teacher co-finetuning method to jointly finetune multiple teacher PLMs in downstream tasks with shared pooling and prediction layers to align their output space for better collaborative teaching. In addition, we propose a multi-teacher hidden loss and a multi-teacher distillation loss to transfer the useful knowledge in both hidden states and soft labels from multiple teacher PLMs to the student model. Experiments on three benchmark datasets validate the effectiveness of MT-BERT in compressing PLMs.

show abstract

Section: Introductionmentioning

confidence: 99%

One Teacher is Enough? Pre-trained Language Model Distillation from Multiple Teachers

Wu¹,

Wu²,

Huang³

2021

Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

View full text Add to dashboard Cite

show abstract

“…The focus is about the generator, but it would be interesting if the same mechanism can help to improve the discriminator. What is more, tuning the framework with data augmentation as a regularization [29] and knowledge distillation [30] is also interesting. We will follow this idea and explore it as part of the future work.…”

Section: Discussionmentioning

confidence: 99%

Noise Homogenization via Multi-Channel Wavelet Filtering for High-Fidelity Sample Generation in Gans

Zeng

Zhang²

2021

2021 IEEE International Conference on Multimedia and Expo (ICME)

View full text Add to dashboard Cite

In a typical Generative Adversarial Network (GAN), a noise is sampled to generate fake samples via a series of convolutional operations after random initialization. However, current GANs merely rely on the pixel space to sample the noise, increasing the difficulty of approaching the target distribution. Fortunately, the long proven wavelet transformation is able to decompose multiple spectral information from the images. In this work, we propose a novel multi-channel wavelet-based filtering method for GANs, to cope with this problem. The proposed WaveletNet embeds a wavelet deconvolution layer in the generator to take advantage of the wavelet deconvolution. By learning a filter with multiple channels, or multiple convolutional filters, it can efficiently homogenize the sampled noise via an averaging operation, to generate high-fidelity samples. We conducted benchmark experiments on the Fashion-MNIST, KMNIST, and SVHN datasets through an open GAN benchmark tool. The results showed that WaveletGAN has excellent performance in generating high-fidelity samples.

show abstract

“…Since 2012, when AlexNet 13 won the 2012 ILSVRC competition, 14 numerous important breakthroughs in computer vision have been achieved using DCNNs 15‐20 . Benefit from the development of DCNNs, continuous optimization of object detection algorithms in natural images, and the release of open‐source medical image datasets, the studies on object detection in medical images have made significant progress.…”

Section: Related Workmentioning

confidence: 99%

An intelligent system of pelvic lymph node detection

et al. 2021

View full text Add to dashboard Cite

Computed tomography (CT) scanning is a fast and painless procedure that can capture clear imaging information beneath the abdomen and is widely used to help diagnose and monitor disease progress. The pelvic lymph node is a key indicator of colorectal cancer metastasis. In the traditional process, an experienced radiologist must read all the CT scanning images slice by slice to track the lymph nodes for future diagnosis. However, this process is time‐consuming, exhausting, and subjective due to the complex pelvic structure, numerous blood vessels, and small lymph nodes. Therefore, automated methods are desirable to make this process easier. Currently, the available open‐source CTLNDataset only contains large lymph nodes. Consequently, a new data set called PLNDataset, which is dedicated to lymph nodes within the pelvis, is constructed to solve this issue. A two‐level annotation calibration method is proposed to guarantee the quality and correctness of pelvic lymph node annotation. Moreover, a novel system composed of a keyframe localization network and a lymph node detection network is proposed to detect pelvic lymph nodes in CT scanning images. The proposed method makes full use of two kinds of prior knowledge: spatial prior knowledge for keyframe localization and anchor prior knowledge for lymph node detection. A series of experiments are carried out to evaluate the proposed method, including ablation experiments, comparing other state‐of‐the‐art methods, and visualization of results. The experimental results demonstrate that our proposed method outperforms other methods on PLNDataset and CTLNDataset. This system is expected to be applied in future clinical practice.

show abstract

Knowledge Distillation: A Survey

Cited by 1,325 publications

References 261 publications

One Teacher is Enough? Pre-trained Language Model Distillation from Multiple Teachers

One Teacher is Enough? Pre-trained Language Model Distillation from Multiple Teachers

Noise Homogenization via Multi-Channel Wavelet Filtering for High-Fidelity Sample Generation in Gans

An intelligent system of pelvic lymph node detection

Contact Info

Product

Resources

About