2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022
DOI: 10.1109/cvpr52688.2022.01667
|View full text |Cite
|
Sign up to set email alerts
|

DeepFusion: Lidar-Camera Deep Fusion for Multi-Modal 3D Object Detection

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
37
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
1
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 197 publications
(71 citation statements)
references
References 25 publications
0
37
0
Order By: Relevance
“…Recently, the success of Transformer [10] draws much attention to Transformer-based LiDAR-camera fusion. Specifically, DeepFusion [11] fuses deep camera and LiDAR features instead of decorating raw LiDAR points at the input level. Therein, LearnableAlign is introduced that leverages the cross-attention mechanism to dynamically correlate Li-DAR information with the most related camera features.…”
Section: Fusion Methods For 3d Object Detectionmentioning
confidence: 99%
See 1 more Smart Citation
“…Recently, the success of Transformer [10] draws much attention to Transformer-based LiDAR-camera fusion. Specifically, DeepFusion [11] fuses deep camera and LiDAR features instead of decorating raw LiDAR points at the input level. Therein, LearnableAlign is introduced that leverages the cross-attention mechanism to dynamically correlate Li-DAR information with the most related camera features.…”
Section: Fusion Methods For 3d Object Detectionmentioning
confidence: 99%
“…We also compare ImmFusion with the LiDAR-camera fusion methods, i.e. DeepFusion [11] and TokenFusion [12], which show the state-of-the-art accuracy in the task of 3D object detection. As DeepFusion is a generic Transformer-based block that is incompatible with the dimension-reduction mechanism, we adopt the parametric reconstruction pipeline by replacing the detection framework of DeepFusion with linear projection to regress SMPL-X parameters.…”
Section: E Comparison With Relevant Methodsmentioning
confidence: 99%
“…YOLO [ 81 ] represented a big leap for object detection in 2D images, and 3D versions have been proposed [ 82 , 83 , 84 ]. Other approaches based on 3D descriptors [ 85 , 86 , 87 ] or other deep learning architectures [ 88 , 89 , 90 ] have also been the subject of research. These approaches do not rely on a priori knowledge about the objects to be recognized, but in human-made objects, it is usual that familiar geometric shapes are prevalent.…”
Section: Applicationsmentioning
confidence: 99%
“…The reasons are twofold: 1) Lidar scans provide comprehensive representations in 3D for inferring correspondences between sensors, and 2) camera images contain more semantic information to further boost the recognition ability. Various directions have been explored, such as image detection in 2D before projecting into frustums [32,54], two-stage frameworks with object-centric modality fusion [7,11,17], image feature-based lidar point decoration [50,51], or multi-level fusion [19,11,31]. Since sparse correspondences between camera and lidar are well defined, fusion is mostly focused on integrating information rather than matching points from different sensors.…”
Section: Introductionmentioning
confidence: 99%