2021
DOI: 10.48550/arxiv.2104.11896
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

M3DeTR: Multi-representation, Multi-scale, Mutual-relation 3D Object Detection with Transformers

Abstract: We present a novel architecture for 3D object detection, M3DETR, which combines different point cloud representations (raw, voxels, bird-eye view) with different feature scales based on multi-scale feature pyramids. M3DETR is the first approach that unifies multiple point cloud representations, feature scales, as well as models mutual relationships between point clouds simultaneously using transformers. We perform extensive ablation experiments that highlight the benefits of fusing representation and scale, an… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
1

Relationship

1
4

Authors

Journals

citations
Cited by 5 publications
(5 citation statements)
references
References 56 publications
0
5
0
Order By: Relevance
“…Secondly, our current two-stage refinement modules only use features from the bird-eye view which may not take full advantage of the high-resolution virtual points generated from our algorithm. We believe point or voxel-based two-stage 3D detectors like PVRCNN [45] and M3Detr [15] may give more significant improvements. Finally, the point-based abstraction connecting 2D and 3D detection may introduce too large of a bottleneck to transmit information from 2D to 3D.…”
Section: Discussionmentioning
confidence: 98%
“…Secondly, our current two-stage refinement modules only use features from the bird-eye view which may not take full advantage of the high-resolution virtual points generated from our algorithm. We believe point or voxel-based two-stage 3D detectors like PVRCNN [45] and M3Detr [15] may give more significant improvements. Finally, the point-based abstraction connecting 2D and 3D detection may introduce too large of a bottleneck to transmit information from 2D to 3D.…”
Section: Discussionmentioning
confidence: 98%
“…Alternatively, [44] PSNet proposes a fast data structuring method to tackle the data structuring issue in point-based methods. There are also other methods that combine both point-wise and voxel-wise features in a local field [10], [45] or a global field [46]- [48]. In addition, some of these approaches propose using continuous convolution on the 3D point clouds by the designed kernel structure.…”
Section: Related Workmentioning
confidence: 99%
“…Consequently, Transformer architectures have also started to emerge for point cloud processing. For instance, [24] and [25] employ Transformers for 3D object detection, whereas [26] developed a method for point cloud segmentation. However, due to the complex nature of outdoor LiDAR data, Transformer models are yet to establish themselves on outdoor benchmarks.…”
Section: Related Workmentioning
confidence: 99%