KTAN: Knowledge Transfer Adversarial Network

Liu, Peiye; Liu, Wu; Ma, Huadóng; Jiang, Zhewei; Seok, Mingoo

doi:10.1109/ijcnn48605.2020.9207235

Cited by 20 publications

(12 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Hint learning [24] distills a deeper and thinner student model by imitating both the soft outputs and intermediate feature representations of the teacher model. Similar works are presented in [31,19,6] but are designed mainly for classifiers.…”

Section: Knowledge Distillationmentioning

confidence: 97%

“…The distilled knowledge is defined as soft label outputs from a large teacher network, which possibly contain the structural information among different classes. Following KD, many methods are proposed to either utilize the softmax outputs [6,19] or mimic the feature layer of the teacher network [24,30,31]. However, these methods are mainly designed for multi-label classification, which cannot adapt to object detection directly.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Distilling Object Detectors with Task Adaptive Regularization

Sun,

Tang,

Zhang

et al. 2020

Preprint

View full text Add to dashboard Cite

Current state-of-the-art object detectors are at the expense of high computational costs and are hard to deploy to low-end devices. Knowledge distillation, which aims at training a smaller student network by transferring knowledge from a larger teacher model, is one of the promising solutions for model miniaturization. In this paper, we investigate each module of a typical detector in depth, and propose a general distillation framework that adaptively transfers knowledge from teacher to student according to the task specific priors. The intuition is that simply distilling all information from teacher to student is not advisable, instead we should only borrow priors from the teacher model where the student cannot perform well. Towards this goal, we propose a region proposal sharing mechanism to interflow region responses between the teacher and student models. Based on this, we adaptively transfer knowledge at three levels, i.e., feature backbone, classification head, and bounding box regression head, according to which model performs more reasonably. Furthermore, considering that it would introduce optimization dilemma when minimizing distillation loss and detection loss simultaneously, we propose a distillation decay strategy to help improve model generalization via gradually reducing the distillation penalty. Experiments on widely used detection benchmarks demonstrate the effectiveness of our method. In particular, using Faster R-CNN with FPN as an instantiation, we achieve an accuracy of 39.0% with Resnet-50 on COCO dataset, which surpasses the baseline 36.3% by 2.7% points, and even better than the teacher model with 38.5% mAP.

show abstract

Section: Knowledge Distillationmentioning

confidence: 97%

Section: Introductionmentioning

confidence: 99%

Distilling Object Detectors with Task Adaptive Regularization

Sun,

Tang,

Zhang

et al. 2020

Preprint

View full text Add to dashboard Cite

show abstract

“…Fortunately, some recently proposed subsampling methods [36], [37] may be applied to eliminate these low-quality samples. Additionally, some works [38]- [41] propose to incorporate the adversarial loss of GANs into KD, but their performance is not state-of-the-art.…”

Section: Introductionmentioning

confidence: 99%

“…Although existing GAN-related KD methods [38]- [41] seem related with the proposed cGAN-KD in concept, our method is fundamentally different from them, mainly due to three reasons: (1) Our approach is the first framework that utilizes cGAN-generated samples to distill and transfer knowledge, while works [38]- [41] only incorporate adversarial losses into conventional KD methods (e.g., [4]), and they cannot achieve the state-of-the-art performance. (2) Our KD framework is applicable to both classification and regression tasks, while KD methods in [38]- [41] can only apply to classification tasks. (3) Our approach is compatible with stateof-the-art KD methods (e.g., [15], [16]) and we can generally boost their performances, while methods in [38]- [41] do not have such a merit.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Distilling and Transferring Knowledge via cGAN-generated Samples for Image Classification and Regression

Ding¹,

Wang²,

Xu³

et al. 2021

Preprint

View full text Add to dashboard Cite

Knowledge distillation (KD) has been actively studied for image classification tasks in deep learning, aiming to improve the performance of a lightweight (student) model based on the knowledge from a heavyweight (teacher) model. However, applying KD in image regression with a scalar response variable has been rarely studied, and there exists no KD method applicable to both classification and regression tasks yet. Moreover, existing KD methods often require a practitioner to carefully select or adjust the teacher and student architectures, making these methods less scalable in practice. Furthermore, although KD is usually conducted in scenarios with limited labeled data, very few techniques are developed to alleviate such data insufficiency. To address the above concerns in a unified way, we propose a comprehensive KD framework based on conditional generative adversarial networks (cGANs), termed cGAN-KD. Fundamentally different from existing KD methods, cGAN-KD distills and transfers knowledge from a teacher model to a student model via cGAN-generated samples. This novel mechanism makes cGAN-KD suitable for both classification and regression tasks, compatible with other KD methods, and insensitive to the teacher and student architectures. Also, benefiting from recent advances in cGAN and our specially designed subsampling and filtering procedures, the proposed cGAN-KD also performs well when labeled data are limited. An error bound for a student model trained in the cGAN-KD framework is derived in this work, providing a theory for why cGAN-KD is effective as well as guiding the practical implementation of cGAN-KD. Extensive experiments on CIFAR-10 and Tiny-ImageNet show that we can incorporate state-of-the-art KD methods into the cGAN-KD framework to yield a new state of the art. Also, experiments on RC-49 and UTKFace demonstrate the effectiveness of cGAN-KD in image regression tasks, where existing KD methods are inapplicable. Notably, cGAN-KD still works well even if only 20% CIFAR-10 and 10% RC-49 data are available for training.

show abstract

Training Lightweight yet Competent Network via Transferring Complementary Features

Zhang

Gong

et al. 2020

Communications in Computer and Information Science

View full text Add to dashboard Cite

KTAN: Knowledge Transfer Adversarial Network

Cited by 20 publications

References 14 publications

Distilling Object Detectors with Task Adaptive Regularization

Distilling Object Detectors with Task Adaptive Regularization

Distilling and Transferring Knowledge via cGAN-generated Samples for Image Classification and Regression

Training Lightweight yet Competent Network via Transferring Complementary Features

Contact Info

Product

Resources

About