RoarNet: A Robust 3D Object Detection based on RegiOn Approximation Refinement

Shin, Kiwoo; Kwon, Youngwook Paul; Tomizuka, Masayoshi

doi:10.1109/ivs.2019.8813895

Cited by 173 publications

(102 citation statements)

References 23 publications

(40 reference statements)

Supporting

Mentioning

101

Contrasting

Order By: Relevance

“…LiDAR-Based 3D Object Detection. Existing works have explored three ways of processing the LiDAR data for 3D object detection: (1) As the convolutional neural networks (CNNs) can naturally process images, many works focus on projecting the LiDAR point cloud into the bird's eye view (BEV) images as a pre-processing step and then regressing the 3D bounding box based on the features extracted from the BEV images [2,56,57,24,20,64,59,63]; (2) On the other hand, one can divide the LiDAR point cloud into equally spaced 3D voxels and then apply 3D CNNs for 3D bounding box prediction [25,62,73]; (3) The most popular approach so far is to directly process the LiDAR point cloud through the neural network without pre-processing [22,10,45,65,61,40,41,44,11,71,16,54,34,23]. To this end, novel neural networks that can directly consume the point cloud are developed [7,35,47,69,18,53,15].…”

Section: Related Workmentioning

confidence: 99%

Monocular 3D Object Detection with Pseudo-LiDAR Point Cloud

Weng

Kitani

2019

2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW)

242

142

View full text Add to dashboard Cite

Monocular 3D scene understanding tasks, such as object size estimation, heading angle estimation and 3D localization, is challenging. Successful modern day methods for 3D scene understanding require the use of a 3D sensor. On the other hand, single image based methods have significantly worse performance. In this work, we aim at bridging the performance gap between 3D sensing and 2D sensing for 3D object detection by enhancing LiDAR-based algorithms to work with single image input. Specifically, we perform monocular depth estimation and lift the input image to a point cloud representation, which we call pseudo-LiDAR point cloud. Then we can train a LiDAR-based 3D detection network with our pseudo-LiDAR end-to-end. Following the pipeline of two-stage 3D detection algorithms, we detect 2D object proposals in the input image and extract a point cloud frustum from the pseudo-LiDAR for each proposal. Then an oriented 3D bounding box is detected for each frustum. To handle the large amount of noise in the pseudo-LiDAR, we propose two innovations: (1) use a 2D-3D bounding box consistency constraint, adjusting the predicted 3D bounding box to have a high overlap with its corresponding 2D proposal after projecting onto the image; (2) use the instance mask instead of the bounding box as the representation of 2D proposals, in order to reduce the number of points not belonging to the object in the point cloud frustum. Through our evaluation on the KITTI benchmark, we achieve the top-ranked performance on both bird's eye view and 3D object detection among all monocular methods, effectively quadrupling the performance over previous state-of-the-art. Our code is available at https: //github.com/xinshuoweng/Mono3D_PLiDAR.

show abstract

Section: Related Workmentioning

confidence: 99%

Monocular 3D Object Detection with Pseudo-LiDAR Point Cloud

Weng

Kitani

2019

2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW)

242

142

View full text Add to dashboard Cite

show abstract

“…Following the tremendous advances in deep learning methods for computer vision, a large body of literature has investigated to what extent this technology could be applied towards object detection from lidar point clouds [31,29,30,11,2,21,15,28,26,25]. While there are many similarities between the modalities, there are two key differences: 1) the point cloud is a sparse representation, while an image is dense and 2) the point cloud is 3D, while the image is 2D.…”

Section: Introductionmentioning

confidence: 99%

PointPillars: Fast Encoders for Object Detection From Point Clouds

Lang

Vora

Caesar

et al. 2019

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

2,796

2,121

View full text Add to dashboard Cite

Object detection in point clouds is an important aspect of many robotics applications such as autonomous driving. In this paper we consider the problem of encoding a point cloud into a format appropriate for a downstream detection pipeline. Recent literature suggests two types of encoders; fixed encoders tend to be fast but sacrifice accuracy, while encoders that are learned from data are more accurate, but slower. In this work we propose PointPillars, a novel encoder which utilizes PointNets to learn a representation of point clouds organized in vertical columns (pillars). While the encoded features can be used with any standard 2D convolutional detection architecture, we further propose a lean downstream network. Extensive experimentation shows that PointPillars outperforms previous encoders with respect to both speed and accuracy by a large margin. Despite only using lidar, our full detection pipeline significantly outperforms the state of the art, even among fusion methods, with respect to both the 3D and bird's eye view KITTI benchmarks. This detection performance is achieved while running at 62 Hz: a 2 -4 fold runtime improvement. A faster version of our method matches the state of the art at 105 Hz. These benchmarks suggest that PointPillars is an appropriate encoding for object detection in point clouds.

show abstract

“…PointNet [139] and its improved version PointNet++ [140] propose to predict individual features for each point and aggregate the features from several points via max pooling. This method was firstly introduced in 3D object recognition and later extended by Qi et al [105], Xu et al [104] and Shin et al [141] to 3D object detection in combination with RGB images. Furthermore, Wang et al [142] propose a new learnable operator called Parametric Continuous Convolution to aggregate points via a weighted sum, and Li et al [143] propose to learn a χ transformation before applying transformed point cloud features into standard CNN.…”

Section: ) Lidar Point Cloudsmentioning

confidence: 99%

Deep Multi-Modal Object Detection and Semantic Segmentation for Autonomous Driving: Datasets, Methods, and Challenges

Feng

Haase-Schütz

Rosenbaum

et al. 2021

IEEE Trans. Intell. Transport. Syst.

808

338

View full text Add to dashboard Cite

Recent advancements in perception for autonomous driving are driven by deep learning. In order to achieve robust and accurate scene understanding, autonomous vehicles are usually equipped with different sensors (e.g. cameras, LiDARs, Radars), and multiple sensing modalities can be fused to exploit their complementary properties. In this context, many methods have been proposed for deep multi-modal perception problems. However, there is no general guideline for network architecture design, and questions of "what to fuse", "when to fuse", and "how to fuse" remain open. This review paper attempts to systematically summarize methodologies and discuss challenges for deep multi-modal object detection and semantic segmentation in autonomous driving. To this end, we first provide an overview of on-board sensors on test vehicles, open datasets, and background information for object detection and semantic segmentation in autonomous driving research. We then summarize the fusion methodologies and discuss challenges and open questions. In the appendix, we provide tables that summarize topics and methods. We also provide an interactive online platform to navigate each reference: https://boschresearch.github.io/multimodalperception/. 0.99 0.8 0.98 0.99 0.96 0.96 0.94 Vehicle Person Road sign Traffic light LiDAR Points Map Radar Points RGB Image

show abstract

RoarNet: A Robust 3D Object Detection based on RegiOn Approximation Refinement

Cited by 173 publications

References 23 publications

Monocular 3D Object Detection with Pseudo-LiDAR Point Cloud

Monocular 3D Object Detection with Pseudo-LiDAR Point Cloud

PointPillars: Fast Encoders for Object Detection From Point Clouds

Deep Multi-Modal Object Detection and Semantic Segmentation for Autonomous Driving: Datasets, Methods, and Challenges

Contact Info

Product

Resources

About