Train in Germany, Test in the USA: Making 3D Object Detectors Generalize

Wang, Yan; Chen, Xiangyu; You, Yurong; Li, Li Erran; Hariharan, Bharath; Campbell, Mark; Weinberger, Kilian Q.; Chao, Weilun

doi:10.1109/cvpr42600.2020.01173

Cited by 122 publications

(168 citation statements)

References 47 publications

Supporting

Mentioning

164

Contrasting

Order By: Relevance

“…Evaluation Metric. We evaluate SEE on the "Car" category in the KITTI validation dataset, similar to other UDA methods [56,64]. We follow the official KITTI evaluation metric and report the average precision (AP) over 40 recall positions at 0.7 and 0.5 IoU thresholds for both BEV and 3D IoUs.…”

Section: See Results On Public Datasetsmentioning

confidence: 99%

“…For the specific task of labelled source domain to unlabelled target domain for 3D object detection across distinct lidar scan patterns, research has been more sparse. Wang et al [56] proposed a semi-supervised approach using object-size statistics of the target domain to resize training samples in the labelled source domain. A popular approach is the use of self-training [43,63,64,67] with a focus on generating quality pseudo-labels using temporal information [43,67] or an IoU scoring criterion for historical pseudo-labels [63,64].…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

See Eye to Eye: A Lidar-Agnostic 3D Detection Framework for Unsupervised Multi-Target Domain Adaptation

Tsai,

Berrio,

Shan

et al. 2021

Preprint

View full text Add to dashboard Cite

Sampling discrepancies between different manufacturers and models of lidar sensors result in inconsistent representations of objects. This leads to performance degradation when 3D detectors trained for one lidar are tested on other types of lidars. Remarkable progress in lidar manufacturing has brought about advances in mechanical, solid-state, and recently, adjustable scan pattern lidars. For the latter, existing works often require fine-tuning the model each time scan patterns are adjusted, which is infeasible. We explicitly deal with the sampling discrepancy by proposing a novel unsupervised multi-target domain adaptation framework, SEE, for transferring the performance of state-of-the-art 3D detectors across both fixed and flexible scan pattern lidars without requiring fine-tuning of models by end-users. Our approach interpolates the underlying geometry and normalizes the scan pattern of objects from different lidars before passing them to the detection network. We demonstrate the effectiveness of SEE on public datasets, achieving state-ofthe-art results, and additionally provide quantitative results on a novel high-resolution lidar to prove the industry applications of our framework. This dataset and our code will be made publicly available.

show abstract

Section: See Results On Public Datasetsmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

See Eye to Eye: A Lidar-Agnostic 3D Detection Framework for Unsupervised Multi-Target Domain Adaptation

Tsai,

Berrio,

Shan

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Besides, note that the camera parameters of the images on the KITTI test set are different from these of the training/validation set, and the good performance on the test set suggests the proposed method can also generalize to different camera parameters. However, generalizing to the new scenes with different statistical characteristics is a hard task for existing 3D detectors Wang et al, 2020b), including the image-based models and LiDAR-based models, and deserves further investigation by future works. We also argue that the proposed method can generalize to the new scenes better than other monocular models because ours model learns the stronger features from the teacher net.…”

Section: A5 Generalization Of the Proposed Methodsmentioning

confidence: 99%

MonoDistill: Learning Spatial Features for Monocular 3D Object Detection

Chong¹,

Ma²,

Zhang³

et al. 2022

Preprint

View full text Add to dashboard Cite

3D object detection is a fundamental and challenging task for 3D scene understanding, and the monocular-based methods can serve as an economical alternative to the stereo-based or LiDAR-based methods. However, accurately detecting objects in the 3D space from a single image is extremely difficult due to the lack of spatial cues. To mitigate this issue, we propose a simple and effective scheme to introduce the spatial information from LiDAR signals to the monocular 3D detectors, without introducing any extra cost in the inference phase. In particular, we first project the LiDAR signals into the image plane and align them with the RGB images. After that, we use the resulting data to train a 3D detector (LiDAR Net) with the same architecture as the baseline model. Finally, this LiDAR Net can serve as the teacher to transfer the learned knowledge to the baseline model. Experimental results show that the proposed method can significantly boost the performance of the baseline model and ranks the 1 st place among all monocularbased methods on the KITTI benchmark. Besides, extensive ablation studies are conducted, which further prove the effectiveness of each part of our designs and illustrate what the baseline model has learned from the LiDAR Net. Our code will be released at https://github.com/monster-ghost/MonoDistill.

show abstract

“…Note that during these 3D object detection experiments, we only make use of angle estimators pre-trained on virtual data. Our method can therefore be considered as fully self-supervised without the use human annotations, which is in contrast to DA-based methods as in [41].…”

Section: Methodsmentioning

confidence: 99%

“…Many of these methods rely either on synthetic data or make use of adversarial feature learning [40]. Recently, the DA task for full 3D object detection has also been considered in [41] for one of the first times. 4) 3D bounding box optimization: We use a 3D bounding box optimization process to obtain a 3D box from a 2D detector and its yaw angle estimate, based on geometrical constraints.…”

Section: ) Monocular Vehicle Orientation Estimationmentioning

confidence: 99%

What My Motion tells me about Your Pose: A Self-Supervised Monocular 3D Vehicle Detector

Picron

Chakravarty²,

Roussel

et al. 2021

2021 IEEE International Conference on Robotics and Automation (ICRA)

View full text Add to dashboard Cite

The estimation of the orientation of an observed vehicle relative to an Autonomous Vehicle (AV) from monocular camera data is an important building block in estimating its 6 DoF pose. Current Deep Learning based solutions for placing a 3D bounding box around this observed vehicle are data hungry and do not generalize well. In this paper, we demonstrate the use of monocular visual odometry for the self-supervised fine-tuning of a model for orientation estimation pre-trained on a reference domain. Specifically, while transitioning from a virtual dataset (vKITTI) to nuScenes, we recover up to 70% of the performance of a fully supervised method. We subsequently demonstrate an optimization-based monocular 3D bounding box detector built on top of the self-supervised vehicle orientation estimator without the requirement of expensive labeled data. This allows 3D vehicle detection algorithms to be self-trained from large amounts of monocular camera data from existing commercial vehicle fleets.

show abstract

Train in Germany, Test in the USA: Making 3D Object Detectors Generalize

Cited by 122 publications

References 47 publications

See Eye to Eye: A Lidar-Agnostic 3D Detection Framework for Unsupervised Multi-Target Domain Adaptation

See Eye to Eye: A Lidar-Agnostic 3D Detection Framework for Unsupervised Multi-Target Domain Adaptation

MonoDistill: Learning Spatial Features for Monocular 3D Object Detection

What My Motion tells me about Your Pose: A Self-Supervised Monocular 3D Vehicle Detector

Contact Info

Product

Resources

About