2022
DOI: 10.48550/arxiv.2205.07417
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Transformers in 3D Point Clouds: A Survey

Abstract: In recent years, Transformer models have been proven to have the remarkable ability of long-range dependencies modeling. They have achieved satisfactory results both in Natural Language Processing (NLP) and image processing. This significant achievement sparks great interest among researchers in 3D point cloud processing to apply them to various 3D tasks. Due to the inherent permutation invariance and strong global feature learning ability, 3D Transformers are well suited for point cloud processing and analysi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 10 publications
(14 citation statements)
references
References 104 publications
0
6
0
Order By: Relevance
“…Early fusion methods, for instance, involve rendering 3D information as multi-view 2D images with an additional depth channel (RGBD), which can then be processed by standard 2D convolutions (Cui et al, 2022) (MVCNN). Alternatively, 2D images can be rendered as a 3D graph, tree, or raster point cloud representation (Lu et al, 2022). However, these 2D methods often lose some 3D geometric context and struggle with per-point label prediction.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Early fusion methods, for instance, involve rendering 3D information as multi-view 2D images with an additional depth channel (RGBD), which can then be processed by standard 2D convolutions (Cui et al, 2022) (MVCNN). Alternatively, 2D images can be rendered as a 3D graph, tree, or raster point cloud representation (Lu et al, 2022). However, these 2D methods often lose some 3D geometric context and struggle with per-point label prediction.…”
Section: Related Workmentioning
confidence: 99%
“…Recent advancements in MVCNN networks include ShapeConv (Cao et al, 2021) and FPS-Net (Xiao et al, 2021). On the other hand, late fusion combines the outputs of multiple networks and averages the results, for example, by integrating Point Transformers (Lu et al, 2022) with purely image-based networks. The advantage here is that each modality can be trained separately, leveraging numerous available benchmarks.…”
Section: Related Workmentioning
confidence: 99%
“…PointNet mainly relies on local feature learning to aggregate global information progressively, it is still not efficient for robotic grasping, which requires an effective encoding of the global information of an input. In computer vision and graphics, researchers have explored the use of transformer models on point cloud processing [29], such as point cloud segmentation [20], classification [30], and shape completion [31]. However, the number of points in a point cloud input is not fixed and too high to be processed with multi-head attention efficiently, which is especially serious in a real robot scenario.…”
Section: A 6-dof Grasping On Point Cloudmentioning
confidence: 99%
“…Additionally, positional encoding conveys information about token positions (see Figure 2). These benefits have spurred significant interest in transformers across various AI domains [76][77][78][79][80][81][82], notably the audio community. This has given rise to diverse architectures such as Wav2Vec [83], Whisper [84], FastPitch [85], MusicBERT [86], and others [26,87,88].…”
Section: Transformers For Audio Processingmentioning
confidence: 99%