2022
DOI: 10.48550/arxiv.2206.14451
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

SRCN3D: Sparse R-CNN 3D Surround-View Camera Object Detection and Tracking for Autonomous Driving

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(2 citation statements)
references
References 0 publications
0
2
0
Order By: Relevance
“…Multi-camera 3D object detection predicts the 3D bounding boxes of the objects of interest from the input surrounding views. Motivated by typical works in 2D detection [6,55,38], researchers combine 3D prior and propose different 3D object detection frameworks to directly achieve sparse object-level features extraction [45,9,37,11]. In recent new paradigms of autonomous driving, the BEV space attracts much attention because of its advantages in perception [35,47,56,7,33], prediction [1,14], multi-task learning [46,51,28,5] and downstream planning [33], etc.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Multi-camera 3D object detection predicts the 3D bounding boxes of the objects of interest from the input surrounding views. Motivated by typical works in 2D detection [6,55,38], researchers combine 3D prior and propose different 3D object detection frameworks to directly achieve sparse object-level features extraction [45,9,37,11]. In recent new paradigms of autonomous driving, the BEV space attracts much attention because of its advantages in perception [35,47,56,7,33], prediction [1,14], multi-task learning [46,51,28,5] and downstream planning [33], etc.…”
Section: Related Workmentioning
confidence: 99%
“…3D object detection from multi-camera 2D images is a critical perception technique for autonomous driving systems with compared to expensive LiDAR-based [10,18,3] or multi-modal approaches [42,41,50,49,2,30,12]. Recent approaches emphasize transforming 2D image features to sparse instance-level [9,37,45] or dense Bird's Eye View (BEV) representation [16,22,26], characterizing the 3D structure of the surrounding environment. Although some depth-based detectors [16,17,21,26,51] incorporate depth estimation to introduce such 3D information, the extra depth supervision is acquired for preciser detection.…”
Section: Introductionmentioning
confidence: 99%