Localization Distillation for Dense Object Detection

Zheng, Zhaohui; Ye, Rongguang; Wang, Ping; Ren, Dongwei; Zuo, Wangmeng; Hou, Qibin; Cheng, Ming–Ming

doi:10.1109/cvpr52688.2022.00919

Cited by 98 publications

(24 citation statements)

References 45 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In early versions of YOLOv6, the self-distillation is only introduced in large models (i.e., YOLOv6-M/L), which applies the vanilla knowledge distillation technique by minimizing the KL-divergence between the class prediction of the teacher and the student. Meanwhile DFL [8] is adopted as regression loss to perform self-distillation on box regression similar to [19].…”

Section: Self-distillationmentioning

confidence: 99%

YOLOv6 v3.0: A Full-Scale Reloading

Li¹,

Li²,

Geng³

et al. 2023

Preprint

View full text Add to dashboard Cite

show abstract

Section: Self-distillationmentioning

confidence: 99%

YOLOv6 v3.0: A Full-Scale Reloading

Li¹,

Li²,

Geng³

et al. 2023

Preprint

View full text Add to dashboard Cite

show abstract

“…Li et al [24] used region proposals of the larger network to help the smaller network learn higher semantic information. Zheng et al [25] transferred the knowledge distillation of the classification head to the location head of object detection, leading to a new distillation mechanism termed Localization Distillation (LD). LD makes logit mimicking become a better alternative to feature imitation, and reveals the knowledge of object category and object location should be handled separately.…”

Section: B Knowledge Distillationmentioning

confidence: 99%

AMD: Adaptive Masked Distillation for Object

Yang¹,

Tang²,

Li³

et al. 2023

Preprint

View full text Add to dashboard Cite

As a general model compression paradigm, featurebased knowledge distillation allows the student model to learn expressive features from the teacher counterpart. In this paper, we mainly focus on designing an effective feature-distillation framework and propose a spatial-channel adaptive masked distillation (AMD) network for object detection. More specifically, in order to accurately reconstruct important feature regions, we first perform attention-guided feature masking on the feature map of the student network, such that we can identify the important features via spatially adaptive feature masking instead of random masking in the previous methods. In addition, we employ a simple and efficient module to allow the student network channel to be adaptive, improving its model capability in object perception and detection. In contrast to the previous methods, more crucial object-aware features can be reconstructed and learned from the proposed network, which is conducive to accurate object detection. The empirical experiments demonstrate the superiority of our method: with the help of our proposed distillation method, the student networks report 41.3%, 42.4%, and 42.7% mAP scores when RetinaNet, Cascade Mask-RCNN and RepPoints are respectively used as the teacher framework for object detection, which outperforms the previous state-of-the-art distillation methods including FGD and MGD.

show abstract

“…Recently, (Dai et al 2021;Yang et al 2021;Chen et al 2021;Zhang and Ma 2020) achieve feature-based distillation by focusing on the foreground area or considering a weight matrix for the features. LD (Zheng et al 2022) implements the difficult problem of localization distillation from the response level by converting the regression of bounding boxes to the probability distribution representation. Besides, Cross-modal feature distillation approaches (Chong et al 2022;Guo et al 2021) are gaining popularity as a way to take advantage of the complementarity between different modalities.…”

Section: Related Workmentioning

confidence: 99%

“…current KD methods of object detection can be mainly classified into the feature-based and response-based streams, in which the former carry out distillation at the feature level (Zagoruyko and Komodakis 2017;Romero et al 2014;Huang and Wang 2017;Heo et al 2019;Ye et al 2020;Du et al 2020) for enforcing the consistency of feature representations between the teacher-student pair whereas the latter adopts the confident prediction from the teacher model as soft targets in addition to the hard ground truth supervision (Yuan et al 2020;Zheng et al 2022;Dai et al 2021). However, directly migrating the existing KD methods to LiDAR-to-stereo cross-modal distillation is less effective due to the huge gap between the two extremely different modalities.…”

Section: Introductionmentioning

confidence: 99%

StereoDistill: Pick the Cream from LiDAR for Distilling Stereo-based 3D Object Detection

Liu¹,

Ye²,

Tan³

et al. 2023

Preprint

View full text Add to dashboard Cite

In this paper, we propose a cross-modal distillation method named StereoDistill to narrow the gap between the stereo and LiDAR-based approaches via distilling the stereo detectors from the superior LiDAR model at the response level, which is usually overlooked in 3D object detection distillation. The key designs of StereoDistill are: the X-component Guided Distillation (XGD) for regression and the Crossanchor Logit Distillation (CLD) for classification. In XGD, instead of empirically adopting a threshold to select the highquality teacher predictions as soft targets, we decompose the predicted 3D box into sub-components and retain the corresponding part for distillation if the teacher component pilot is consistent with ground truth to largely boost the number of positive predictions and alleviate the mimicking difficulty of the student model. For CLD, we aggregate the probability distribution of all anchors at the same position to encourage the highest probability anchor rather than individually distill the distribution at the anchor level. Finally, our StereoDistill achieves state-of-the-art results for stereo-based 3D detection on the KITTI test benchmark and extensive experiments on KITTI and Argoverse Dataset validate the effectiveness.

show abstract

Localization Distillation for Dense Object Detection

Cited by 98 publications

References 45 publications

YOLOv6 v3.0: A Full-Scale Reloading

YOLOv6 v3.0: A Full-Scale Reloading

AMD: Adaptive Masked Distillation for Object

StereoDistill: Pick the Cream from LiDAR for Distilling Stereo-based 3D Object Detection

Contact Info

Product

Resources

About