2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2018
DOI: 10.1109/iros.2018.8594049
|View full text |Cite
|
Sign up to set email alerts
|

Joint 3D Proposal Generation and Object Detection from View Aggregation

Abstract: We present AVOD, an Aggregate View Object Detection network for autonomous driving scenarios. The proposed neural network architecture uses LIDAR point clouds and RGB images to generate features that are shared by two subnetworks: a region proposal network (RPN) and a second stage detector network. The proposed RPN uses a novel architecture capable of performing multimodal feature fusion on high resolution feature maps to generate reliable 3D object proposals for multiple object classes in road scenes. Using t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

3
1,031
0
2

Year Published

2018
2018
2021
2021

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 1,356 publications
(1,122 citation statements)
references
References 21 publications
3
1,031
0
2
Order By: Relevance
“…The resulting point cloud is referred to as pseudo‐LiDAR . The pseudo‐LiDAR data can be further fed to 3D deep learning processing methods, such as PointNet (Qi, Su, Mo, & Guibas, ) or aggregate view object detection (AVOD; Ku, Mozifian, Lee, Harakeh, & Waslander, ). The success of image‐based 3D estimation is of high importance to the large‐scale deployment of autonomous cars, since the LiDAR is arguably one of the most expensive hardware components in a self‐driving vehicle.…”
Section: Deep Learning For Driving Scene Perception and Localizationmentioning
confidence: 99%
See 1 more Smart Citation
“…The resulting point cloud is referred to as pseudo‐LiDAR . The pseudo‐LiDAR data can be further fed to 3D deep learning processing methods, such as PointNet (Qi, Su, Mo, & Guibas, ) or aggregate view object detection (AVOD; Ku, Mozifian, Lee, Harakeh, & Waslander, ). The success of image‐based 3D estimation is of high importance to the large‐scale deployment of autonomous cars, since the LiDAR is arguably one of the most expensive hardware components in a self‐driving vehicle.…”
Section: Deep Learning For Driving Scene Perception and Localizationmentioning
confidence: 99%
“…PointNet and VoxelNet (Zhou & Tuzel, 2018) The main disadvantage in using a LiDAR in the sensory suite of a self-driving car is primarily its cost. 5 A solution here would be to use neural network architectures, such as AVOD (Ku et al, 2018), which leverage on LiDAR data only for training, while images are used during training and deployment. At deployment stage, AVOD is able to predict 3D bounding boxes of objects solely from image data.…”
Section: Bounding-box-like Object Detectorsmentioning
confidence: 99%
“…6D object pose estimators [40], [38], [163], [167], [168], [159], [160], [31], [32], [4], [35], [161] extract features from the input images, and using the trained regressor, estimate objects' 6D pose. Several methods further refine the output of the trained regressors [101], [83], [79], [82], [108], [40], [38], [163], [167], [168], [159], [160], [31], [32], [4], [35], [161] (refinement block), and finally hypothesise the object pose after filtering. Table III elaborates the regression-based methods.…”
Section: A Classificationmentioning
confidence: 99%
“…Autonomous driving, as being a focus of attention of both industry and research community in recent years, fundamentally requires accurate object detection and pose estimation in order for a vehicle to avoid collisions with pedestrians, cyclists, and cars. To this end, autonomous vehicles are equipped with active LIDAR sensors [142], [146], passive Mono/Stereo (Mo/St) RGB/D/RGB-D cameras [84], [87], and their fused systems [82], [108]. Electrical and Electronic Engineering Department, Imperial Computer Vision and Learning Lab (ICVL), Imperial College London, SW72AZ, UK, {c.sahin14, g.garcia-hernando, ju-il.sock08, tk.kim}@imperial.ac.uk Robotics has various sub-fields that vastly benefit from accurate object detection and pose estimation.…”
Section: Introductionmentioning
confidence: 99%
“…Methods should be at least 20Hz since onboard application should cover 360 degree rather than KITTI annotation at limited 90 degree. Drawn methods are FP: F-PointNet [20], AF: AVOD-FPN [9], M: MMF [13], I: IPOD [31], FC: F-ConvNet [26], S: STD [32], PR: PointRCNN [22], FPR: Fast Point R-CNN [2], SE: SECOND [28], PP: PointPillars [10], PI: PIXOR++ [29] and O: our HVNet. For PointPillars we use their runtime on PyTorch for a fair comparison.…”
Section: Introductionmentioning
confidence: 99%