2023
DOI: 10.48550/arxiv.2301.06051
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

DSVT: Dynamic Sparse Voxel Transformer with Rotated Sets

Abstract: Designing an efficient yet deployment-friendly 3D backbone to handle sparse point clouds is a fundamental problem in 3D object detection. Compared with the customized sparse convolution, the attention mechanism in Transformers is more appropriate for flexibly modeling long-range relationships and is easier to be deployed in real-world applications. However, due to the sparse characteristics of point clouds, it is non-trivial to apply a standard transformer on sparse points. In this paper, we present Dynamic Sp… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 37 publications
(94 reference statements)
0
2
0
Order By: Relevance
“…Due to the recent progress in transformers for computer vision [10], the transformer architecture has also been applied to 3D object detection. Existing works include transformer-based backbones [44,22,12,65,13,73] for the voxel-based representation, [50,91] for the point-based representation, and [21] for a combination of both the point and voxel representation. Furthermore, transformers have been used to improve the detection head [97] and for sensor fusion [2,90,76].…”
Section: Lidar-based 3d Object Detectionmentioning
confidence: 99%
“…Due to the recent progress in transformers for computer vision [10], the transformer architecture has also been applied to 3D object detection. Existing works include transformer-based backbones [44,22,12,65,13,73] for the voxel-based representation, [50,91] for the point-based representation, and [21] for a combination of both the point and voxel representation. Furthermore, transformers have been used to improve the detection head [97] and for sensor fusion [2,90,76].…”
Section: Lidar-based 3d Object Detectionmentioning
confidence: 99%
“…The cost of structuring data, nevertheless, turned out to be a performance bottleneck for large inputs. Recently, a dynamic sparse voxel transformer (DSVT) was presented by Wang et al ( 2023 ) in an effort to widen the uses of transformers so that they may serve as a solid foundation for outdoor 3D perception just as they do for 2D vision. A number of local regions are split up into smaller ones in each window using DSVT based on sparsity, and each window's attributes are then computed fully in parallel.…”
Section: Related Workmentioning
confidence: 99%