Our system is currently under heavy load due to increased usage. We're actively working on upgrades to improve performance. Thank you for your patience.
2021
DOI: 10.48550/arxiv.2104.11225
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Pri3D: Can 3D Priors Help 2D Representation Learning?

Abstract: Figure 1: Pri3D leverages 3D priors for downstream 2D image understanding tasks: during pre-training, we incorporate view-invariant and geometric priors from color-geometry information given by RGB-D datasets, imbuing geometric priors into learned features. We show that these 3D-imbued learned features can effectively transfer to improved performance on 2D tasks such as semantic segmentation, object detection, and instance segmentation.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
11
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(11 citation statements)
references
References 58 publications
0
11
0
Order By: Relevance
“…3D-to-2D Distillation ] uses an additional 3D network in the training phase to leverage 3D features to complements the RGB inputs for 2D feature extraction. Pri3D [Hou et al 2021] tried to imbue image-based perception with learned viewinvariant, geometry-aware representations based on multi-view RGB-D data for 2D downstream tasks. To overcome the difficulty of cross-modality feature association, DeepI2P [Li and Lee 2021] designed a neural network to covert the registration problem into a classification and inverse camera projection optimization problem.…”
Section: Cross-modality Feature Extraction and Fusionmentioning
confidence: 99%
“…3D-to-2D Distillation ] uses an additional 3D network in the training phase to leverage 3D features to complements the RGB inputs for 2D feature extraction. Pri3D [Hou et al 2021] tried to imbue image-based perception with learned viewinvariant, geometry-aware representations based on multi-view RGB-D data for 2D downstream tasks. To overcome the difficulty of cross-modality feature association, DeepI2P [Li and Lee 2021] designed a neural network to covert the registration problem into a classification and inverse camera projection optimization problem.…”
Section: Cross-modality Feature Extraction and Fusionmentioning
confidence: 99%
“…For example, xMUDA [58] utilizes aligned images and point-clouds to transfer 2D feature map information for 3D semantic segmentation through knowledge distillation [59]. For cross-modal transfer learning [60], Liu et al [61] proposed pixel-topoint knowledge transfer (PPKT) from 2D to 3D which uses aligned RGB and RGB-D images during pretraining. Our work does not rely on joint image-point-cloud pretraining.…”
Section: Cross-modal Learningmentioning
confidence: 99%
“…With recent breakthroughs in deep learning and the increasing prominence of RGB-D data, the computer vision community has made tremendous progress on analyzing point cloud [35] and images [20,19]. Recently, we observe a rapid progress in cross modality learning between geometry and colors [23,28,27,44,9,7]. However, they are mainly focused on high-level semantic scene understanding tasks, such as semantic/instance segmentation [13,26] and object detection [33].…”
Section: Introductionmentioning
confidence: 99%
“…Furthermore, another 2D pre-trained neural network summarizes the features from pixels in each region. We investigate the effectiveness of 2D pre-trained features in the 3D task by trying different 2D pre-trained backbones, such as ImageNet and Pri3D [23] pre-trained models. We note that the 2D models are pre-trained on different datasets.…”
Section: Introductionmentioning
confidence: 99%