Distilling Object Detectors with Task Adaptive Regularization

Sun, Ruoyu; Tang, Fuhui; Zhang, Xiaopeng; Xiong, Hongkai; Tian, Qi

doi:10.48550/arxiv.2006.13108

Cited by 15 publications

(45 citation statements)

References 26 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Wang et al [28] propose the finegrained mask to distill the regions calculated by groundtruth bounding boxes. Sun et al [25] utilize the Gaussian Mask to cover the ground-truth for distillation. Such methods lack the distillation for the background.…”

Section: Knowledge Distillationmentioning

confidence: 99%

See 1 more Smart Citation

Focal and Global Knowledge Distillation for Detectors

Yang

Li²,

Jiang

et al. 2021

Preprint

View full text Add to dashboard Cite

Knowledge distillation has been applied to image classification successfully. However, object detection is much more sophisticated and most knowledge distillation methods have failed on it. In this paper, we point out that in object detection, the features of the teacher and student vary greatly in different areas, especially in the foreground and background. If we distill them equally, the uneven differences between feature maps will negatively affect the distillation. Thus, we propose Focal and Global Distillation (FGD). Focal distillation separates the foreground and background, forcing the student to focus on the teacher's critical pixels and channels. Global distillation rebuilds the relation between different pixels and transfers it from teachers to students, compensating for missing global information in focal distillation. As our method only needs to calculate the loss on the feature map, FGD can be applied to various detectors. We experiment on various detectors with different backbones and the results show that the student detector achieves excellent mAP improvement. For example, ResNet-50 based RetinaNet, Faster RCNN, RepPoints and Mask RCNN with our distillation method achieve 40.7%, 42.0%, 42.0% and 42.1% mAP on COCO2017, which are 3.3, 3.6, 3.4 and 2.9 higher than the baseline, respectively. Our codes are available at https://github.com/yzd-v/FGD.

show abstract

Section: Knowledge Distillationmentioning

confidence: 99%

“…Mimick [15] distills the positive area proposed by region proposal network (RPN) of the student detector. FGFI [28] and TADF [25] use the fine-grained and Gaussian Mask to select the distillation area, respectively. Defeat [7] distills the foreground and background separately.…”

Section: Introductionmentioning

confidence: 99%

Focal and Global Knowledge Distillation for Detectors

Yang

Li²,

Jiang

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…KD was first popularised for image classification [14] where a student model is trained to mimic the soft labels generated by a teacher model. However, this [10,30,34], Our approach (d) focuses on a few key predictive regions of the teacher.…”

Section: Introductionmentioning

confidence: 99%

“…While soft label-based KD can be directly applied for classification, finding an equivalent for localisation remains a challenge. Recent work [9,10,30,34,35,37,41] alleviates this problem by forcing the student model to generate feature maps similar to the teacher counterpart; a process known as feature imitation.…”

Section: Introductionmentioning

confidence: 99%

“…This question is of the utmost importance for dense object detectors [7,16,19,31,38,42] because, unlike two-stage detectors [3,12,27], they do not use the RoIAlign [12] operation to explicitly pool and align object features; instead they output predictions at every location of the feature map [17]. Recent work [30,35] has shown that distilling the whole feature map with equal weighting is sub-optimal because not all features carry equally meaningful information. Therefore, a weighting mechanism that assigns appropriate importance to different regions, particularly to foreground regions near the objects, is highly desirable for dense object detectors, and has featured in recent work.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Prediction-Guided Distillation for Dense Object Detection

Yang¹,

Ochal²,

Storkey³

et al. 2022

Preprint

View full text Add to dashboard Cite

Real-world object detection models should be cheap and accurate. Knowledge distillation (KD) can boost the accuracy of a small, cheap detection model by leveraging useful information from a larger teacher model. However, a key challenge is identifying the most informative features produced by the teacher for distillation. In this work, we show that only a very small fraction of features within a groundtruth bounding box are responsible for a teacher's high detection performance. Based on this, we propose Prediction-Guided Distillation (PGD), which focuses distillation on these key predictive regions of the teacher and yields considerable gains in performance over many existing KD baselines. In addition, we propose an adaptive weighting scheme over the key regions to smooth out their influence and achieve even better performance. Our proposed approach outperforms current state-of-theart KD baselines on a variety of advanced one-stage detection architectures. Specifically, on the COCO dataset, our method achieves between +3.1% and +4.6% AP improvement using ResNet-101 and ResNet-50 as the teacher and student backbones, respectively. On the Crowd-Human dataset, we achieve +3.2% and +2.0% improvements in MR and AP, also using these backbones. Our code is available at https: //github.com/ChenhongyiYang/PGD.

show abstract

Prediction-Guided Distillation for Dense Object Detection

Yang

Ochal

Storkey

et al. 2022

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Real-world object detection models should be cheap and accurate. Knowledge distillation (KD) can boost the accuracy of a small, cheap detection model by leveraging useful information from a larger teacher model. However, a key challenge is identifying the most informative features produced by the teacher for distillation. In this work, we show that only a very small fraction of features within a groundtruth bounding box are responsible for a teacher's high detection performance. Based on this, we propose Prediction-Guided Distillation (PGD), which focuses distillation on these key predictive regions of the teacher and yields considerable gains in performance over many existing KD baselines. In addition, we propose an adaptive weighting scheme over the key regions to smooth out their influence and achieve even better performance. Our proposed approach outperforms current state-of-theart KD baselines on a variety of advanced one-stage detection architectures. Specifically, on the COCO dataset, our method achieves between +3.1% and +4.6% AP improvement using ResNet-101 and ResNet-50 as the teacher and student backbones, respectively. On the CrowdHuman dataset, we achieve +3.2% and +2.0% improvements in MR and AP, also using these backbones. Our code is available at https://github.com/ ChenhongyiYang/PGD.

show abstract

Distilling Object Detectors with Task Adaptive Regularization

Cited by 15 publications

References 26 publications

Focal and Global Knowledge Distillation for Detectors

Focal and Global Knowledge Distillation for Detectors

Prediction-Guided Distillation for Dense Object Detection

Prediction-Guided Distillation for Dense Object Detection

Contact Info

Product

Resources

About