2022
DOI: 10.48550/arxiv.2209.05324
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Delving into the Devils of Bird's-eye-view Perception: A Review, Evaluation and Recipe

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
3
1
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(8 citation statements)
references
References 0 publications
0
5
0
Order By: Relevance
“…Since monocular methods directly predict 3D objects from single images without considering 3D scene structure, they are more prone to noises [62] and exhibit inferior performance. Besides, BEVFormer performs better than DETR3D, especially under object-level corruptions (e.g., Shear, Rotation), since it can capture both semantic and location information of objects in the BEV space with being less affected by varying object shapes [31].…”
Section: Results On Nuscenes-cmentioning
confidence: 99%
“…Since monocular methods directly predict 3D objects from single images without considering 3D scene structure, they are more prone to noises [62] and exhibit inferior performance. Besides, BEVFormer performs better than DETR3D, especially under object-level corruptions (e.g., Shear, Rotation), since it can capture both semantic and location information of objects in the BEV space with being less affected by varying object shapes [31].…”
Section: Results On Nuscenes-cmentioning
confidence: 99%
“…BEV-based methods [41,42] typically convert 2D image feature to BEV feature using camera parameters, then directly detect objects on BEV planes. We refer readers to recent surveys [28,34] for more detail.…”
Section: Camera-based 3d Object Detection In Autonomous Drivingmentioning
confidence: 99%
“…The run-time accuracy trade-off of object detection methods aimed to be utilized on AVs is studied in [198]. More compact representations of scenes are often utilized in AV planning containing either rasterized graphs with local context [199] or BEV representations [124]. This is suitable as planning must occur fast, but we still believe that articulated human motion ought to be included in the representation.…”
Section: Developments In the Fieldmentioning
confidence: 99%
“…By filtering the data we stand at the risk of possibly missing something important like a partially occluded pedestrian. Therefore how to best represent a traffic scene for autonomous driving is still an open research topic [123][124][125][126][127]. Within motion planning High Definition (HD) maps containing scene details in a compact representation [128], and Bird's Eye View (BEV) images that is to say top view image of the scene, are common because they allow for 2D vision models to easily be utilized on traffic data [124].…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation