Pseudo-LiDAR From Visual Depth Estimation: Bridging the Gap in 3D Object Detection for Autonomous Driving

Wang, Yan; Chao, Weilun; Garg, Divyansh; Hariharan, Bharath; Campbell, Mark; Weinberger, Kilian Q.

doi:10.1109/cvpr.2019.00864

Cited by 900 publications

(772 citation statements)

References 36 publications

Supporting

Mentioning

711

Contrasting

Order By: Relevance

“…Furthermore, additional work is required for bridging the gap between image-and LiDAR-based 3D perception (Wang et al, 2019), enabling the computer vision community to close the current debate on camera versus LiDAR as main perception sensors.…”

Section: Discussionmentioning

confidence: 99%

A survey of deep learning techniques for autonomous driving

Grigorescu

Trasnea

Cocias

et al. 2019

Journal of Field Robotics

1,102

495

View full text Add to dashboard Cite

The last decade witnessed increasingly rapid progress in self‐driving vehicle technology, mainly backed up by advances in the area of deep learning and artificial intelligence (AI). The objective of this paper is to survey the current state‐of‐the‐art on deep learning technologies used in autonomous driving. We start by presenting AI‐based self‐driving architectures, convolutional and recurrent neural networks, as well as the deep reinforcement learning paradigm. These methodologies form a base for the surveyed driving scene perception, path planning, behavior arbitration, and motion control algorithms. We investigate both the modular perception‐planning‐action pipeline, where each module is built using deep learning methods, as well as End2End systems, which directly map sensory information to steering commands. Additionally, we tackle current challenges encountered in designing AI architectures for autonomous driving, such as their safety, training data sources, and computational hardware. The comparison presented in this survey helps gain insight into the strengths and limitations of deep learning and AI approaches for autonomous driving and assist with design choices.

show abstract

Section: Discussionmentioning

confidence: 99%

A survey of deep learning techniques for autonomous driving

Grigorescu

Trasnea

Cocias

et al. 2019

Journal of Field Robotics

1,102

495

View full text Add to dashboard Cite

show abstract

“…6D object pose estimators [40], [38], [163], [167], [168], [159], [160], [31], [32], [4], [35], [161] extract features from the input images, and using the trained regressor, estimate objects' 6D pose. Several methods further refine the output of the trained regressors [101], [83], [79], [82], [108], [40], [38], [163], [167], [168], [159], [160], [31], [32], [4], [35], [161] (refinement block), and finally hypothesise the object pose after filtering. Table III elaborates the regression-based methods.…”

Section: A Classificationmentioning

confidence: 99%

“…Unlike the previous categories of methods, i.e., classification-based and regressionbased, this category performs the classification and regression tasks within a single architecture. The methods can firstly do the classification, the outcomes of which are cured in a regression-based refinement step [105], [84], [78], [166] or vice versa [75], or can do the classification and regression in a single-shot process [87], [145], [101], [106], [100], [148], [103], [102], [30], [37], [162].…”

Section: B Regressionmentioning

confidence: 99%

A review on object pose recovery: From 3D bounding box detectors to full 6D pose estimators

Şahin

Garcia-Hernando

Sock

et al. 2020

Image and Vision Computing

View full text Add to dashboard Cite

Object pose recovery has gained increasing attention in the computer vision field as it has become an important problem in rapidly evolving technological areas related to autonomous driving, robotics, and augmented reality. Existing review-related studies have addressed the problem at visual level in 2D, going through the methods which produce 2D bounding boxes of objects of interest in RGB images. The 2D search space is enlarged either using the geometry information available in the 3D space along with RGB (Mono/Stereo) images, or utilizing depth data from LIDAR sensors and/or RGB-D cameras. 3D bounding box detectors, producing category-level amodal 3D bounding boxes, are evaluated on gravity aligned images, while full 6D object pose estimators are mostly tested at instance-level on the images where the alignment constraint is removed. Recently, 6D object pose estimation is tackled at the level of categories. In this paper, we present the first comprehensive and most recent review of the methods on object pose recovery, from 3D bounding box detectors to full 6D pose estimators. The methods mathematically model the problem as a classification, regression, classification & regression, template matching, and point-pair feature matching task. Based on this, a mathematical-model-based categorization of the methods is established. Datasets used for evaluating the methods are investigated with respect to the challenges, and evaluation metrics are studied. Quantitative results of experiments in the literature are analysed to show which category of methods best performs across what types of challenges. The analyses are further extended comparing two methods, which are our own implementations, so that the outcomes from the public results are further solidified. Current position of the field is summarized regarding object pose recovery, and possible research directions are identified.

show abstract

“…, where n is the number of points. The point cloud obtained from the intermediate depth map is named as Pseudo-LiDAR [22].…”

Section: B Transformation Modulementioning

confidence: 99%

“…Therefore, previous researches [3], [9], [19] prefer to achieve different tasks on 2D depth maps or other projected views. Moreover, the target point cloud can be constructed in the form of Pseudo-LiDAR [22] using known camera intrinsics. For Pseudo-LiDAR interpolation, an intermediate depth map is first generated and then back-projected into the 3D space.…”

Section: Introductionmentioning

confidence: 99%

PLIN: A Network for Pseudo-LiDAR Point Cloud Interpolation

Liu

Liao

Lin

et al. 2020

Sensors

View full text Add to dashboard Cite

LiDAR sensors can provide dependable 3D spatial information at a low frequency (around 10Hz) and have been widely applied in the field of autonomous driving and UAV. However, the camera with a higher frequency (around 20Hz) has to be decreased so as to match with LiDAR in a multisensor system. In this paper, we propose a novel Pseudo-LiDAR interpolation network (PLIN) to increase the frequency of Li-DAR sensors. PLIN can generate temporally and spatially highquality point cloud sequences to match the high frequency of cameras. To achieve this goal, we design a coarse interpolation stage guided by consecutive sparse depth maps and motion relationship. We also propose a refined interpolation stage guided by the realistic scene. Using this coarse-to-fine cascade structure, our method can progressively perceive multi-modal information and generate accurate intermediate point clouds.To the best of our knowledge, this is the first deep framework for Pseudo-LiDAR point cloud interpolation, which shows appealing applications in navigation systems equipped with LiDAR and cameras. Experimental results demonstrate that PLIN achieves promising performance on the KITTI dataset, significantly outperforming the traditional interpolation method and the state-of-the-art video interpolation technique.

show abstract

Pseudo-LiDAR From Visual Depth Estimation: Bridging the Gap in 3D Object Detection for Autonomous Driving

Cited by 900 publications

References 36 publications

A survey of deep learning techniques for autonomous driving

A survey of deep learning techniques for autonomous driving

A review on object pose recovery: From 3D bounding box detectors to full 6D pose estimators

PLIN: A Network for Pseudo-LiDAR Point Cloud Interpolation

Contact Info

Product

Resources

About