Upsampling Autoencoder for Self-Supervised Point Cloud Learning

Zhang, Cheng; Shi, Jing; Wu, Zizhao

doi:10.2139/ssrn.4104342

Cited by 12 publications

(18 citation statements)

References 57 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To address this issue, Point-MAE [10] employs a mini-PointNet [22] as the point embedding module to achieve permutation invariance. Similarily, MaskSurf [11] adds a normal prediction module to enhance point cloud understanding. Even though it has better performance than Point-MAE on realworld dataset, we argue that normal vectors are not sufficiently robust and descriptive to capture all the nuances in the data.…”

Section: B Self-supervised Representation Learningmentioning

confidence: 99%

“…Next, we utilize an MLP to embed the center coordinates of visible patches into positional tokens L v T . We observe that most existing methods [10], [11], [34] based on MAE leverage standard Transformer [16] for self-supervised learning, which has a quadratic computational complexity and ignores the potential correlations between different data samples. Taking visible features tokens T v and positional tokens L v T as inputs, we propose an external attention based Transformer encoder to excavate deep high-level latent features while minimizing the computational cost.…”

Section: B External Attention-based Transformer Encodermentioning

confidence: 99%

“…Few-shot Learning. Following previous works [9]- [11], [32], [53], we conduct few-shot learning experiments with pretrained model on ModelNet40 [37]. We employ an n-way, mshot setup, where n denotes the number of classes randomly selected from the dataset, and m represents the number of objects randomly sampled for each class.…”

Section: B Shape Classificationmentioning

confidence: 99%

“…Although current SSRL methods [9]- [11] have achieved impressive progresses on point clouds, they can be further improved from two aspects: (i) Multiple types of feature representation may be explored, which complement to each others to better perceive the global and local features of point clouds. (ii) It is of critical importance to distinguish different contributions of features in the channel and spatial axes, which is little discussed by current literature.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

PointGame: Geometrically and Adaptively Masked Auto-Encoder on Point Clouds

Liu¹,

Yan²,

Chen³

et al. 2023

Preprint

View full text Add to dashboard Cite

Self-supervised learning is attracting large attention in point cloud understanding. However, exploring discriminative and transferable features still remains challenging due to their nature of irregularity and sparsity. We propose a geometrically and adaptively masked auto-encoder for self-supervised learning on point clouds, termed PointGame. PointGame contains two core components: GATE and EAT. GATE stands for the geometrical and adaptive token embedding module; it not only absorbs the conventional wisdom of geometric descriptors that captures the surface shape effectively, but also exploits adaptive saliency to focus on the salient part of a point cloud. EAT stands for the external attention-based Transformer encoder with linear computational complexity, which increases the efficiency of the whole pipeline. Unlike cutting-edge unsupervised learning models, PointGame leverages geometric descriptors to perceive surface shapes and adaptively mines discriminative features from training data. PointGame showcases clear advantages over its competitors on various downstream tasks under both global and local fine-tuning strategies. The code and pre-trained models will be publicly available.

show abstract

Section: B Self-supervised Representation Learningmentioning

confidence: 99%

Section: B External Attention-based Transformer Encodermentioning

confidence: 99%

Section: B Shape Classificationmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

PointGame: Geometrically and Adaptively Masked Auto-Encoder on Point Clouds

Liu¹,

Yan²,

Chen³

et al. 2023

Preprint

View full text Add to dashboard Cite

show abstract

“…In addition, some recent approaches [10,37,61] have introduced cross-modalities such as images and text to enhance the pre-training of Masked Point Modeling tasks. There are also some methods [26,62] that improve the objective of the improved Masked Point Modeling task.…”

Section: Masked Autoencodersmentioning

confidence: 99%

Regress Before Construct: Regress Autoencoder for Point Cloud Self-supervised Learning

Liu,

Chen,

Wang

et al. 2023

Proceedings of the 31st ACM International Conference on Multimedia

View full text Add to dashboard Cite

Masked Autoencoders (MAE) have demonstrated promising performance in self-supervised learning for both 2D and 3D computer vision. Nevertheless, existing MAE-based methods still have certain drawbacks. Firstly, the functional decoupling between the encoder and decoder is incomplete, which limits the encoder's representation learning ability. Secondly, downstream tasks solely utilize the encoder, failing to fully leverage the knowledge acquired through the encoder-decoder architecture in the pre-text task. In this paper, we propose Point Regress AutoEncoder (Point-RAE), a new scheme for regressive autoencoders for point cloud selfsupervised learning. The proposed method decouples functions between the decoder and the encoder by introducing a mask regressor, which predicts the masked patch representation from the visible patch representation encoded by the encoder and the decoder reconstructs the target from the predicted masked patch representation. By doing so, we minimize the impact of decoder updates on the representation space of the encoder. Moreover, we introduce an alignment constraint to ensure that the representations for masked patches, predicted from the encoded representations of visible patches, are aligned with the masked patch presentations computed from the encoder. To make full use of the knowledge learned in the pre-training stage, we design a new finetune mode for the proposed Point-RAE. Extensive experiments demonstrate that our approach is efficient during pre-training and generalizes well on various downstream tasks. Specifically, our pretrained models achieve a high accuracy of 90.28% on the ScanOb-jectNN hardest split and 94.1% accuracy on ModelNet40, surpassing all the other self-supervised learning methods. Our code and pretrained model are public available at: https://github.com/liuyyy111/ Point-RAE.

show abstract