2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017
DOI: 10.1109/cvpr.2017.198
|View full text |Cite
|
Sign up to set email alerts
|

Deep MANTA: A Coarse-to-Fine Many-Task Network for Joint 2D and 3D Vehicle Analysis from Monocular Image

Abstract: In this paper, we present a novel approach, called Deep MANTA (Deep Many-Tasks), for many-task vehicle analysis from a given image. A robust convolutional network is introduced for simultaneous vehicle detection, part localization, visibility characterization and 3D dimension estimation. Its architecture is based on a new coarse-to-fine object proposal that boosts the vehicle detection. Moreover, the Deep MANTA network is able to localize vehicle parts even if these parts are not visible. In the inference, the… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
310
0
2

Year Published

2018
2018
2021
2021

Publication Types

Select...
4
4
1

Relationship

0
9

Authors

Journals

citations
Cited by 417 publications
(315 citation statements)
references
References 41 publications
0
310
0
2
Order By: Relevance
“…10 we show qualitative results on a set of images taken from the validation set for the classes Car (top), Pedestrian (middle) and Cyclist (bottom). We also provide a video 3 showing detection results obtained on a sequence from the validation set. The structure of the frames is similar to the one in Fig.…”
Section: D Detectionmentioning
confidence: 99%
See 1 more Smart Citation
“…10 we show qualitative results on a set of images taken from the validation set for the classes Car (top), Pedestrian (middle) and Cyclist (bottom). We also provide a video 3 showing detection results obtained on a sequence from the validation set. The structure of the frames is similar to the one in Fig.…”
Section: D Detectionmentioning
confidence: 99%
“…2 We calculated these from the precision-recall values published in the KITTI3D leaderboard page. 3 https://research.mapillary.com/publication/ MonoDIS…”
Section: D Detectionmentioning
confidence: 99%
“…Xu and Chen [46] proposed to fuse a monocular depth estimation module and achieved high-precision localization. Chabot et al [6] presented Deep MANTA (Deep Many-Tasks) for simultaneous vehicle detection, part localization and visibility characterization, but their method requires part locations and visibility annotations. In this paper, we propose a unified deep learning based pipeline, which does not require additional labels and can be trained end-to-end using a large number of augmented data.…”
Section: Related Workmentioning
confidence: 99%
“…2D-driven 3D bounding box (BB) detection methods enlarge the 2D search space using the available appearance and geometry information in the 3D space along with RGB images [90], [151], [153], [154], [144]. The methods presented in [84], [155] directly detect 3D BBs of the objects in a monocular RGB image exploiting contextual models as well as semantics.…”
Section: Introductionmentioning
confidence: 99%