Joint 3D Proposal Generation and Object Detection from View Aggregation

Ku, Jason S.; Mozifian, Melissa; Lee, Jungwook; Harakeh, Ali; Waslander, Steven L.

doi:10.1109/iros.2018.8594049

Cited by 1,356 publications

(1,122 citation statements)

References 21 publications

Supporting

Mentioning

1,031

Contrasting

Unclassified

Order By: Relevance

“…The resulting point cloud is referred to as pseudo‐LiDAR . The pseudo‐LiDAR data can be further fed to 3D deep learning processing methods, such as PointNet (Qi, Su, Mo, & Guibas, ) or aggregate view object detection (AVOD; Ku, Mozifian, Lee, Harakeh, & Waslander, ). The success of image‐based 3D estimation is of high importance to the large‐scale deployment of autonomous cars, since the LiDAR is arguably one of the most expensive hardware components in a self‐driving vehicle.…”

Section: Deep Learning For Driving Scene Perception and Localizationmentioning

confidence: 99%

“…PointNet and VoxelNet (Zhou & Tuzel, 2018) The main disadvantage in using a LiDAR in the sensory suite of a self-driving car is primarily its cost. 5 A solution here would be to use neural network architectures, such as AVOD (Ku et al, 2018), which leverage on LiDAR data only for training, while images are used during training and deployment. At deployment stage, AVOD is able to predict 3D bounding boxes of objects solely from image data.…”

Section: Bounding-box-like Object Detectorsmentioning

confidence: 99%

See 1 more Smart Citation

A survey of deep learning techniques for autonomous driving

Grigorescu

Trasnea

Cocias

et al. 2019

Journal of Field Robotics

1,136

495

View full text Add to dashboard Cite

The last decade witnessed increasingly rapid progress in self‐driving vehicle technology, mainly backed up by advances in the area of deep learning and artificial intelligence (AI). The objective of this paper is to survey the current state‐of‐the‐art on deep learning technologies used in autonomous driving. We start by presenting AI‐based self‐driving architectures, convolutional and recurrent neural networks, as well as the deep reinforcement learning paradigm. These methodologies form a base for the surveyed driving scene perception, path planning, behavior arbitration, and motion control algorithms. We investigate both the modular perception‐planning‐action pipeline, where each module is built using deep learning methods, as well as End2End systems, which directly map sensory information to steering commands. Additionally, we tackle current challenges encountered in designing AI architectures for autonomous driving, such as their safety, training data sources, and computational hardware. The comparison presented in this survey helps gain insight into the strengths and limitations of deep learning and AI approaches for autonomous driving and assist with design choices.

show abstract

Section: Deep Learning For Driving Scene Perception and Localizationmentioning

confidence: 99%

Section: Bounding-box-like Object Detectorsmentioning

confidence: 99%

A survey of deep learning techniques for autonomous driving

Grigorescu

Trasnea

Cocias

et al. 2019

Journal of Field Robotics

1,136

495

View full text Add to dashboard Cite

show abstract

“…6D object pose estimators [40], [38], [163], [167], [168], [159], [160], [31], [32], [4], [35], [161] extract features from the input images, and using the trained regressor, estimate objects' 6D pose. Several methods further refine the output of the trained regressors [101], [83], [79], [82], [108], [40], [38], [163], [167], [168], [159], [160], [31], [32], [4], [35], [161] (refinement block), and finally hypothesise the object pose after filtering. Table III elaborates the regression-based methods.…”

Section: A Classificationmentioning

confidence: 99%

“…Autonomous driving, as being a focus of attention of both industry and research community in recent years, fundamentally requires accurate object detection and pose estimation in order for a vehicle to avoid collisions with pedestrians, cyclists, and cars. To this end, autonomous vehicles are equipped with active LIDAR sensors [142], [146], passive Mono/Stereo (Mo/St) RGB/D/RGB-D cameras [84], [87], and their fused systems [82], [108]. Electrical and Electronic Engineering Department, Imperial Computer Vision and Learning Lab (ICVL), Imperial College London, SW72AZ, UK, {c.sahin14, g.garcia-hernando, ju-il.sock08, tk.kim}@imperial.ac.uk Robotics has various sub-fields that vastly benefit from accurate object detection and pose estimation.…”

Section: Introductionmentioning

confidence: 99%

A review on object pose recovery: From 3D bounding box detectors to full 6D pose estimators

Şahin

Garcia-Hernando

Sock

et al. 2020

Image and Vision Computing

View full text Add to dashboard Cite

Object pose recovery has gained increasing attention in the computer vision field as it has become an important problem in rapidly evolving technological areas related to autonomous driving, robotics, and augmented reality. Existing review-related studies have addressed the problem at visual level in 2D, going through the methods which produce 2D bounding boxes of objects of interest in RGB images. The 2D search space is enlarged either using the geometry information available in the 3D space along with RGB (Mono/Stereo) images, or utilizing depth data from LIDAR sensors and/or RGB-D cameras. 3D bounding box detectors, producing category-level amodal 3D bounding boxes, are evaluated on gravity aligned images, while full 6D object pose estimators are mostly tested at instance-level on the images where the alignment constraint is removed. Recently, 6D object pose estimation is tackled at the level of categories. In this paper, we present the first comprehensive and most recent review of the methods on object pose recovery, from 3D bounding box detectors to full 6D pose estimators. The methods mathematically model the problem as a classification, regression, classification & regression, template matching, and point-pair feature matching task. Based on this, a mathematical-model-based categorization of the methods is established. Datasets used for evaluating the methods are investigated with respect to the challenges, and evaluation metrics are studied. Quantitative results of experiments in the literature are analysed to show which category of methods best performs across what types of challenges. The analyses are further extended comparing two methods, which are our own implementations, so that the outcomes from the public results are further solidified. Current position of the field is summarized regarding object pose recovery, and possible research directions are identified.

show abstract

“…Methods should be at least 20Hz since onboard application should cover 360 degree rather than KITTI annotation at limited 90 degree. Drawn methods are FP: F-PointNet [20], AF: AVOD-FPN [9], M: MMF [13], I: IPOD [31], FC: F-ConvNet [26], S: STD [32], PR: PointRCNN [22], FPR: Fast Point R-CNN [2], SE: SECOND [28], PP: PointPillars [10], PI: PIXOR++ [29] and O: our HVNet. For PointPillars we use their runtime on PyTorch for a fair comparison.…”

Section: Introductionmentioning

confidence: 99%