Monocular 3D Object Detection Leveraging Accurate Proposals and Shape Reconstruction

Ku, Jason S.; Pon, Alex D.; Waslander, Steven L.

doi:10.1109/cvpr.2019.01214

Cited by 258 publications

(133 citation statements)

References 36 publications

Supporting

Mentioning

131

Contrasting

Order By: Relevance

“…We compare our approach with state-of-the-art methods [2], [6], [12], [13], [15], [16], [31], which are divided into two groups depending on the input (i.e., point clouds or camera images). One group consists of MonoPSR [31] (Mono-based) and Stereo R-CNN [6] (Stereo-based) which process camera images with RGB information. The other group includes MV3D (LiDAR) [2], BirdNet [12], RT3D [13], VeloFCN [15] and LMNet [16] which are based on point clouds only.…”

Section: B Comparison With State-of-the-art Methodsmentioning

confidence: 99%

FVNet: 3D Front-View Proposal Generation for Real-Time Object Detection from Point Clouds

Zhou

Tan

Shao

et al. 2019

2019 12th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI)

View full text Add to dashboard Cite

3D object detection from raw and sparse point clouds has been far less treated to date, compared with its 2D counterpart. In this paper, we propose a novel framework called FVNet for 3D front-view proposal generation and object detection from point clouds. It consists of two stages: generation of front-view proposals and estimation of 3D bounding box parameters. Instead of generating proposals from camera images or bird's-eye-view maps, we first project point clouds onto a cylindrical surface to generate front-view feature maps which retains rich information. We then introduce a proposal generation network to predict 3D region proposals from the generated maps and further extrude objects of interest from the whole point cloud. Finally, we present another network to extract the point-wise features from the extruded object points and regress the final 3D bounding box parameters in the canonical coordinates. Our framework achieves real-time performance with 12ms per point cloud sample. Extensive experiments on the 3D detection benchmark KITTI show that the proposed architecture outperforms state-of-the-art techniques which take either camera images or point clouds as input, in terms of accuracy and inference time.

show abstract

Section: B Comparison With State-of-the-art Methodsmentioning

confidence: 99%

FVNet: 3D Front-View Proposal Generation for Real-Time Object Detection from Point Clouds

Zhou

Tan

Shao

et al. 2019

2019 12th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI)

View full text Add to dashboard Cite

show abstract

“…There are some evaluation indicators that can be used for object detection, such as Precision, Recall, F1 score, average precision (AP), and mean average precision (mAP) as expressed by Formulas (4)–(8) [108,109,110,111,112,113,114], respectively. Precision represents the proportion of all identified correct instances.…”

Section: Applications Of Point Clouds Using Deep Learningmentioning

confidence: 99%

Deep Learning on Point Clouds and Its Application: A Survey

Liu

Sun

et al. 2019

Sensors

165

View full text Add to dashboard Cite

Point cloud is a widely used 3D data form, which can be produced by depth sensors, such as Light Detection and Ranging (LIDAR) and RGB-D cameras. Being unordered and irregular, many researchers focused on the feature engineering of the point cloud. Being able to learn complex hierarchical structures, deep learning has achieved great success with images from cameras. Recently, many researchers have adapted it into the applications of the point cloud. In this paper, the recent existing point cloud feature learning methods are classified as point-based and tree-based. The former directly takes the raw point cloud as the input for deep learning. The latter first employs a k-dimensional tree (Kd-tree) structure to represent the point cloud with a regular representation and then feeds these representations into deep learning models. Their advantages and disadvantages are analyzed. The applications related to point cloud feature learning, including 3D object classification, semantic segmentation, and 3D object detection, are introduced, and the datasets and evaluation metrics are also collected. Finally, the future research trend is predicted.

show abstract

“…Unlike the previous categories of methods, i.e., classification-based and regressionbased, this category performs the classification and regression tasks within a single architecture. The methods can firstly do the classification, the outcomes of which are cured in a regression-based refinement step [105], [84], [78], [166] or vice versa [75], or can do the classification and regression in a single-shot process [87], [145], [101], [106], [100], [148], [103], [102], [30], [37], [162].…”

Section: B Regressionmentioning

confidence: 99%

“…The regression of d is conducted by the L 2 loss, while the bin-based discrete-continuous loss is applied to firstly discretize θ y into n overlapping bins, and then to regress the angle within each bin. The input of MonoPSR [106] is an RGB image, which is not subjected to any pre-processing. Once the 2D BB proposals for the the object of interest are generated using MS-CNN [123], MonoPSR hypothesises 3D proposals, which are then fed into a CNN scoring refinement step.…”

Section: Classification and Regressionmentioning

confidence: 99%

A review on object pose recovery: From 3D bounding box detectors to full 6D pose estimators

Şahin

Garcia-Hernando

Sock

et al. 2020

Image and Vision Computing

View full text Add to dashboard Cite

Object pose recovery has gained increasing attention in the computer vision field as it has become an important problem in rapidly evolving technological areas related to autonomous driving, robotics, and augmented reality. Existing review-related studies have addressed the problem at visual level in 2D, going through the methods which produce 2D bounding boxes of objects of interest in RGB images. The 2D search space is enlarged either using the geometry information available in the 3D space along with RGB (Mono/Stereo) images, or utilizing depth data from LIDAR sensors and/or RGB-D cameras. 3D bounding box detectors, producing category-level amodal 3D bounding boxes, are evaluated on gravity aligned images, while full 6D object pose estimators are mostly tested at instance-level on the images where the alignment constraint is removed. Recently, 6D object pose estimation is tackled at the level of categories. In this paper, we present the first comprehensive and most recent review of the methods on object pose recovery, from 3D bounding box detectors to full 6D pose estimators. The methods mathematically model the problem as a classification, regression, classification & regression, template matching, and point-pair feature matching task. Based on this, a mathematical-model-based categorization of the methods is established. Datasets used for evaluating the methods are investigated with respect to the challenges, and evaluation metrics are studied. Quantitative results of experiments in the literature are analysed to show which category of methods best performs across what types of challenges. The analyses are further extended comparing two methods, which are our own implementations, so that the outcomes from the public results are further solidified. Current position of the field is summarized regarding object pose recovery, and possible research directions are identified.

show abstract

Monocular 3D Object Detection Leveraging Accurate Proposals and Shape Reconstruction

Cited by 258 publications

References 36 publications

FVNet: 3D Front-View Proposal Generation for Real-Time Object Detection from Point Clouds

FVNet: 3D Front-View Proposal Generation for Real-Time Object Detection from Point Clouds

Deep Learning on Point Clouds and Its Application: A Survey

A review on object pose recovery: From 3D bounding box detectors to full 6D pose estimators

Contact Info

Product

Resources

About