2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021
DOI: 10.1109/cvpr46437.2021.00845
|View full text |Cite
|
Sign up to set email alerts
|

Categorical Depth Distribution Network for Monocular 3D Object Detection

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
154
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 329 publications
(196 citation statements)
references
References 31 publications
0
154
0
Order By: Relevance
“…Another solution is to transform image features into BEV features and predict 3D bounding boxes from the top-down view. Methods transform image features into BEV features with the depth information from depth estimation [46] or categorical depth distribution [34]. OFT [36] and ImVoxelNet [37] project the predefined voxels onto image features to generate the voxel representation of the scene.…”
Section: Camera-based 3d Perceptionmentioning
confidence: 99%
See 2 more Smart Citations
“…Another solution is to transform image features into BEV features and predict 3D bounding boxes from the top-down view. Methods transform image features into BEV features with the depth information from depth estimation [46] or categorical depth distribution [34]. OFT [36] and ImVoxelNet [37] project the predefined voxels onto image features to generate the voxel representation of the scene.…”
Section: Camera-based 3d Perceptionmentioning
confidence: 99%
“…The bird's-eye-view (BEV) is a commonly used representation of the surrounding scene since it clearly presents the location and scale of objects and is suitable for various autonomous driving tasks, such as perception and planning [29]. Although previous map segmentation methods demonstrate BEV's effectiveness [32,18,29], BEV-based approaches have not shown significant advantages over other paradigm in 3D object detections [47,31,34]. The underlying reason is that the 3D object detection task requires strong BEV features to support accurate 3D bounding box prediction, but generating BEV from the 2D planes is ill-posed.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…MonoFENet [1] estimates disparity from the input monocular image, the features of both the 2D and 3D streams can be enhanced and utilized for accurate 3D localization. CaDDN [34] uses a predicted categorical depth distribution for each pixel to project rich contextual feature information to the appropriate depth interval in 3D space then get the final result. M3DSSD [26] proposes a two-step feature alignment approach to overcome feature mismatching.…”
Section: Monocular 3d Object Detectionmentioning
confidence: 99%
“…Such monocular methods lack the ability of localizing features in a frustum due to depth ambiguity. CaDDN [24] copes with this problem by estimating depth distribution for each pixel before projecting the features to 3D. Similarly for stereo, DSGN [4] builds a plane sweep volume (PSV) distributed using stereo image features to estimate per pixel depth as an auxiliary task in addition to 3D object detection.…”
Section: Monocular and Stereo Detectionmentioning
confidence: 99%