Deep MANTA: A Coarse-to-Fine Many-Task Network for Joint 2D and 3D Vehicle Analysis from Monocular Image

Chabot, Florian; Chaouch, Mohamed; Rabarisoa, Jaonary; Teulière, Céline; Chateau, Thierry

doi:10.1109/cvpr.2017.198

Cited by 417 publications

(315 citation statements)

References 41 publications

Supporting

Mentioning

310

Contrasting

Unclassified

Order By: Relevance

“…10 we show qualitative results on a set of images taken from the validation set for the classes Car (top), Pedestrian (middle) and Cyclist (bottom). We also provide a video 3 showing detection results obtained on a sequence from the validation set. The structure of the frames is similar to the one in Fig.…”

Section: D Detectionmentioning

confidence: 99%

See 1 more Smart Citation

Disentangling Monocular 3D Object Detection

Simonelli

Bulò²,

Porzi³

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

429

308

View full text Add to dashboard Cite

Figure 1: Results obtained from our single image, monocular 3D object detection network MonoDIS on a KITTI3D test image with corresponding birds-eye view, showing its ability to estimate size and orientation of objects at different scales. AbstractIn this paper we propose an approach for monocular 3D object detection from a single RGB image, which leverages a novel disentangling transformation for 2D and 3D detection losses and a novel, self-supervised confidence score for 3D bounding boxes. Our proposed loss disentanglement has the twofold advantage of simplifying the training dynamics in the presence of losses with complex interactions of parameters, and sidestepping the issue of balancing independent regression terms. Our solution overcomes these issues by isolating the contribution made by groups of parameters to a given loss, without changing its nature. We further apply loss disentanglement to another novel, signed Intersection-over-Union criterion-driven loss for improving 2D detection results. Besides our methodological innovations, we critically review the AP metric used in KITTI3D, which emerged as the most important dataset for comparing 3D detection results. We identify and resolve a flaw in the 11-point interpolated AP metric, affecting all previously published detection results and particularly biases the results of monocular 3D detection. We provide extensive experimental evaluations and ablation studies on the KITTI3D and nuScenes datasets, setting new state-of-theart results on object category car by large margins.

show abstract

Section: D Detectionmentioning

confidence: 99%

“…2 We calculated these from the precision-recall values published in the KITTI3D leaderboard page. 3 https://research.mapillary.com/publication/ MonoDIS…”

Section: D Detectionmentioning

confidence: 99%

Disentangling Monocular 3D Object Detection

Simonelli

Bulò²,

Porzi³

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

429

308

View full text Add to dashboard Cite

show abstract

“…Xu and Chen [46] proposed to fuse a monocular depth estimation module and achieved high-precision localization. Chabot et al [6] presented Deep MANTA (Deep Many-Tasks) for simultaneous vehicle detection, part localization and visibility characterization, but their method requires part locations and visibility annotations. In this paper, we propose a unified deep learning based pipeline, which does not require additional labels and can be trained end-to-end using a large number of augmented data.…”

Section: Related Workmentioning

confidence: 99%

Deep Fitting Degree Scoring Network for Monocular 3D Object Detection

Liu

et al. 2019

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

164

View full text Add to dashboard Cite

In this paper, we propose to learn a deep fitting degree scoring network for monocular 3D object detection, which aims to score fitting degree between proposals and object conclusively. Different from most existing monocular frameworks which use tight constraint to get 3D location, our approach achieves high-precision localization through measuring the visual fitting degree between the projected 3D proposals and the object. We first regress the dimension and orientation of the object using an anchor-based method so that a suitable 3D proposal can be constructed. We propose FQNet, which can infer the 3D IoU between the 3D proposals and the object solely based on 2D cues. Therefore, during the detection process, we sample a large number of candidates in the 3D space and project these 3D bounding boxes on 2D image individually. The best candidate can be picked out by simply exploring the spatial overlap between proposals and the object, in the form of the output 3D IoU score of FQNet. Experiments on the KITTI dataset demonstrate the effectiveness of our framework.

show abstract

“…2D-driven 3D bounding box (BB) detection methods enlarge the 2D search space using the available appearance and geometry information in the 3D space along with RGB images [90], [151], [153], [154], [144]. The methods presented in [84], [155] directly detect 3D BBs of the objects in a monocular RGB image exploiting contextual models as well as semantics.…”

Section: Introductionmentioning

confidence: 99%

A review on object pose recovery: From 3D bounding box detectors to full 6D pose estimators

Şahin

Garcia-Hernando

Sock

et al. 2020

Image and Vision Computing

View full text Add to dashboard Cite

Object pose recovery has gained increasing attention in the computer vision field as it has become an important problem in rapidly evolving technological areas related to autonomous driving, robotics, and augmented reality. Existing review-related studies have addressed the problem at visual level in 2D, going through the methods which produce 2D bounding boxes of objects of interest in RGB images. The 2D search space is enlarged either using the geometry information available in the 3D space along with RGB (Mono/Stereo) images, or utilizing depth data from LIDAR sensors and/or RGB-D cameras. 3D bounding box detectors, producing category-level amodal 3D bounding boxes, are evaluated on gravity aligned images, while full 6D object pose estimators are mostly tested at instance-level on the images where the alignment constraint is removed. Recently, 6D object pose estimation is tackled at the level of categories. In this paper, we present the first comprehensive and most recent review of the methods on object pose recovery, from 3D bounding box detectors to full 6D pose estimators. The methods mathematically model the problem as a classification, regression, classification & regression, template matching, and point-pair feature matching task. Based on this, a mathematical-model-based categorization of the methods is established. Datasets used for evaluating the methods are investigated with respect to the challenges, and evaluation metrics are studied. Quantitative results of experiments in the literature are analysed to show which category of methods best performs across what types of challenges. The analyses are further extended comparing two methods, which are our own implementations, so that the outcomes from the public results are further solidified. Current position of the field is summarized regarding object pose recovery, and possible research directions are identified.

show abstract

Deep MANTA: A Coarse-to-Fine Many-Task Network for Joint 2D and 3D Vehicle Analysis from Monocular Image

Cited by 417 publications

References 41 publications

Disentangling Monocular 3D Object Detection

Disentangling Monocular 3D Object Detection

Deep Fitting Degree Scoring Network for Monocular 3D Object Detection

A review on object pose recovery: From 3D bounding box detectors to full 6D pose estimators

Contact Info

Product

Resources

About