2022
DOI: 10.48550/arxiv.2207.08536
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

UniFusion: Unified Multi-view Fusion Transformer for Spatial-Temporal Representation in Bird's-Eye-View

Abstract: Bird's eye view (BEV) representation is a new perception formulation for autonomous driving, which is based on spatial fusion. Further, temporal fusion is also introduced in BEV representation and gains great success. In this work, we propose a new method that unifies both spatial and temporal fusion and merges them into a unified mathematical formulation. The unified fusion could not only provide a new perspective on BEV fusion but also brings new capabilities. With the proposed unified spatial-temporal fusio… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
1
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 20 publications
0
1
0
Order By: Relevance
“…BEVDet4D [64] fuses the previous feature maps with the current frame using a spatial alignment operation followed by a concatenation of multiple feature maps. BEV-Former [4] and UniFormer [126] adopt a soft way to fusion temporal information. The attention module is utilized to fuse temporal information from previous BEV feature maps and previous frames, respectively.…”
Section: Temporal Fusionmentioning
confidence: 99%
“…BEVDet4D [64] fuses the previous feature maps with the current frame using a spatial alignment operation followed by a concatenation of multiple feature maps. BEV-Former [4] and UniFormer [126] adopt a soft way to fusion temporal information. The attention module is utilized to fuse temporal information from previous BEV feature maps and previous frames, respectively.…”
Section: Temporal Fusionmentioning
confidence: 99%
“…PETR and BEVFormer utilize the end-to-end 3D detection head in DETR [6], treating all BEV grids as dense queries. For detecting occluded and dynamic objects better, multi-frame temporal information is also introduced into the origin detection architecture [15,26,34].…”
Section: Related Workmentioning
confidence: 99%