2021
DOI: 10.48550/arxiv.2110.06922
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

DETR3D: 3D Object Detection from Multi-view Images via 3D-to-2D Queries

Abstract: We introduce a framework for multi-camera 3D object detection. In contrast to existing works, which estimate 3D bounding boxes directly from monocular images or use depth prediction networks to generate input for 3D object detection from 2D information, our method manipulates predictions directly in 3D space. Our architecture extracts 2D features from multiple camera images and then uses a sparse set of 3D object queries to index into these 2D features, linking 3D positions to multi-view images using camera tr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
9
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
5

Relationship

0
5

Authors

Journals

citations
Cited by 6 publications
(11 citation statements)
references
References 36 publications
0
9
0
Order By: Relevance
“…Benefitting from the strong spatial correlation of the targets' attribute with the image appearance, it works well in predicting this but is relatively poor in perceiving the targets' translation, velocity, and depth. Following DETR [3], DETR3D [44] proposes to detect 3D objects in an attention pattern, which achieves similar performance as FCOS3D but with a smaller computing budget and faster inference speed. PGD [43] further develops the FCOS3D paradigm by searching and resolving with the outstanding shortcoming (i.e.…”
Section: Visual Based 3d Object Detectionmentioning
confidence: 99%
See 2 more Smart Citations
“…Benefitting from the strong spatial correlation of the targets' attribute with the image appearance, it works well in predicting this but is relatively poor in perceiving the targets' translation, velocity, and depth. Following DETR [3], DETR3D [44] proposes to detect 3D objects in an attention pattern, which achieves similar performance as FCOS3D but with a smaller computing budget and faster inference speed. PGD [43] further develops the FCOS3D paradigm by searching and resolving with the outstanding shortcoming (i.e.…”
Section: Visual Based 3d Object Detectionmentioning
confidence: 99%
“…nuScenes dataset includes 1000 scenes with images from 6 cameras with surrounding views, points from 5 Radars and 1 LiDAR. It is the up-todate popular benchmark for 3D object detection [42,44,43,29] and BEV semantic segmentation [34,30,28,47]. The scenes are officially split into 700/150/150 scenes for training/validation/testing.…”
Section: Experimental Settingsmentioning
confidence: 99%
See 1 more Smart Citation
“…However, Grassia [15] states that Euler angles and quaternions representation [16] are not ideal to compute and differentiate positions, due to the discontinuity [17] and ambiguity problem. We still observe recent researchers such as [18,19,20,21,22,23,24,25,26,27,28,29,30] using these types of scalar-based angle representation in 3D Object Detection. We are also noticing some variants of the scalar-based methods.…”
Section: -D Scalar Representationsmentioning
confidence: 93%
“…Transformer-based approaches are another line of research for perception in BEV space. In object detection task, DETR3D [8] introduces a 3D bounding boxes detection method that directly generates predictions in 3D space from 2D features of multiple camera images. The view transformation between 3D space and 2D image space is achieved by 3D-to-2D queries of a cross-attention module.…”
Section: Introductionmentioning
confidence: 99%