Delving into the Devils of Bird's-eye-view Perception: A Review, Evaluation and Recipe

Li, Hongyang; Sima, Chonghao; Dai, Jifeng; Wang, Wenhai; Lu, Lewei; Wang, Huijie; Xie, Enze; Li, Zhiqi; Deng, Hanming; Tian, Hanqin; Zhu, Xizhou; Chen, Li; Gao, Yulu; Geng, Xiangwei; Zeng, Jia; Li, Yang; Yang, Jiazhi; Jia, Xiaosong; Bao, Yu; Qiao, Yu; Lin, Dahua; Liu, Si; Yan, Junchi; Shi, Jianping; Luo, Ping

doi:10.48550/arxiv.2209.05324

Cited by 5 publications

(8 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Since monocular methods directly predict 3D objects from single images without considering 3D scene structure, they are more prone to noises [62] and exhibit inferior performance. Besides, BEVFormer performs better than DETR3D, especially under object-level corruptions (e.g., Shear, Rotation), since it can capture both semantic and location information of objects in the BEV space with being less affected by varying object shapes [31].…”

Section: Results On Nuscenes-cmentioning

confidence: 99%

Benchmarking Robustness of 3D Object Detection to Common Corruptions in Autonomous Driving

Dong¹,

Kang²,

Zhang³

et al. 2023

Preprint

View full text Add to dashboard Cite

3D object detection is an important task in autonomous driving to perceive the surroundings. Despite the excellent performance, the existing 3D detectors lack the robustness to real-world corruptions caused by adverse weathers, sensor noises, etc., provoking concerns about the safety and reliability of autonomous driving systems. To comprehensively and rigorously benchmark the corruption robustness of 3D detectors, in this paper we design 27 types of common corruptions for both LiDAR and camera inputs considering real-world driving scenarios. By synthesizing these corruptions on public datasets, we establish three corruption robustness benchmarks-KITTI-C, nuScenes-C, and Waymo-C. Then, we conduct large-scale experiments on 24 diverse 3D object detection models to evaluate their corruption robustness. Based on the evaluation results, we draw several important findings, including: 1) motion-level corruptions are the most threatening ones that lead to significant performance drop of all models; 2) LiDAR-camera fusion models demonstrate better robustness; 3) camera-only models are extremely vulnerable to image corruptions, showing the indispensability of LiDAR point clouds. We release the benchmarks and codes at https://github.com/kkkcx/ 3D_Corruptions_AD. We hope that our benchmarks and findings can provide insights for future research on developing robust 3D object detection models.

show abstract

Section: Results On Nuscenes-cmentioning

confidence: 99%

Benchmarking Robustness of 3D Object Detection to Common Corruptions in Autonomous Driving

Dong¹,

Kang²,

Zhang³

et al. 2023

Preprint

View full text Add to dashboard Cite

show abstract

“…BEV-based methods [41,42] typically convert 2D image feature to BEV feature using camera parameters, then directly detect objects on BEV planes. We refer readers to recent surveys [28,34] for more detail.…”

Section: Camera-based 3d Object Detection In Autonomous Drivingmentioning

confidence: 99%

Efficiently generating sentence-level textual adversarial examples with Seq2seq Stacked Auto-Encoder

Zhang

et al. 2023

Expert Systems with Applications

View full text Add to dashboard Cite

“…The run-time accuracy trade-off of object detection methods aimed to be utilized on AVs is studied in [198]. More compact representations of scenes are often utilized in AV planning containing either rasterized graphs with local context [199] or BEV representations [124]. This is suitable as planning must occur fast, but we still believe that articulated human motion ought to be included in the representation.…”

Section: Developments In the Fieldmentioning

confidence: 99%

“…By filtering the data we stand at the risk of possibly missing something important like a partially occluded pedestrian. Therefore how to best represent a traffic scene for autonomous driving is still an open research topic [123][124][125][126][127]. Within motion planning High Definition (HD) maps containing scene details in a compact representation [128], and Bird's Eye View (BEV) images that is to say top view image of the scene, are common because they allow for 2D vision models to easily be utilized on traffic data [124].…”

Section: Introductionmentioning

confidence: 99%

“…Therefore how to best represent a traffic scene for autonomous driving is still an open research topic [123][124][125][126][127]. Within motion planning High Definition (HD) maps containing scene details in a compact representation [128], and Bird's Eye View (BEV) images that is to say top view image of the scene, are common because they allow for 2D vision models to easily be utilized on traffic data [124]. Both HD and BEV are compressed scene representations that do not in general allow for sensor data augmentations.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Semantic Synthesis of Pedestrian Locomotion

Priisalu

Păduraru

Pirinen

et al. 2021

Computer Vision – ACCV 2020

View full text Add to dashboard Cite

It is difficult to perform 3D reconstruction from on-vehicle gathered video due to the large forward motion of the vehicle. Even object detection and human sensing models perform significantly worse on onboard videos when compared to standard benchmarks because objects often appear far away from the camera compared to the standard object detection benchmarks, image quality is often decreased by motion blur and occlusions occur often. This has led to the popularisation of traffic data-specific benchmarks. Recently Light Detection And Ranging (LiDAR) sensors have become popular to directly estimate depths without the need to perform 3D reconstructions. However, LiDAR-based methods still lack in articulated human detection at a distance when compared to image-based methods. We hypothesize that benchmarks targeted at articulated human sensing from LiDAR data could bring about increased research in human sensing and prediction in traffic and could lead to improved traffic safety for pedestrians.

show abstract

Delving into the Devils of Bird's-eye-view Perception: A Review, Evaluation and Recipe

Cited by 5 publications

References 0 publications

Benchmarking Robustness of 3D Object Detection to Common Corruptions in Autonomous Driving

Benchmarking Robustness of 3D Object Detection to Common Corruptions in Autonomous Driving

Efficiently generating sentence-level textual adversarial examples with Seq2seq Stacked Auto-Encoder

Semantic Synthesis of Pedestrian Locomotion

Contact Info

Product

Resources

About