2022
DOI: 10.1007/978-3-031-19769-7_22
|View full text |Cite
|
Sign up to set email alerts
|

Multi-modal Masked Pre-training for Monocular Panoramic Depth Completion

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 8 publications
(5 citation statements)
references
References 59 publications
0
5
0
Order By: Relevance
“…4 reports the comparison on two panoramic depth completion datasets. We observe that, with the lowest parameters, RigNet++ is still significantly superior to UniFuse [30], HoHoNet [59], GuideNet [63], 360Depth [53], and M 3 PT [71]. For example, compared to M 3 PT that uses additional masked pre-training strategy [19], RigNet++ still achieves 15.1% lower RMSE in average and higher δ i despite their REL metrics are marginally close.…”
Section: Evaluation On Indoor Nyuv2mentioning
confidence: 87%
See 2 more Smart Citations
“…4 reports the comparison on two panoramic depth completion datasets. We observe that, with the lowest parameters, RigNet++ is still significantly superior to UniFuse [30], HoHoNet [59], GuideNet [63], 360Depth [53], and M 3 PT [71]. For example, compared to M 3 PT that uses additional masked pre-training strategy [19], RigNet++ still achieves 15.1% lower RMSE in average and higher δ i despite their REL metrics are marginally close.…”
Section: Evaluation On Indoor Nyuv2mentioning
confidence: 87%
“…The latest Mat-terport3D 1 (512 × 256) contains 7,907 panoramic RGB-D pairs, of which 5636 for training, 744 for validating, and 1527 for testing. For panoramic depth completion, M 3 PT [71] proposes to synthesize the sparse depth. It first projects the equirectangular ground-truth depth into cubical map to remove the distortion.…”
Section: Datasets and Metricsmentioning
confidence: 99%
See 1 more Smart Citation
“…Most of the existing middle fusion methods only exploited simple concatenation or summation operation, which cannot effectively fuse multi-modal information. Some recent works had explored several effective fusion methods, such as guided convolution module [16], multi-modal masked pre-training (M3PT) [24], adaptive symmetric gated fusion [17] and channel shuffle [20]. Although PENet [22] considered both the early and late fusion, they only used simple concatenate operations.…”
Section: Multi-modal Fusionmentioning
confidence: 99%
“…Considering the fact that RGB data modality and depth data modality have different statistical properties, various fusion methods [4,[15][16][17][18][19][20][21][22][23][24][25] have been proposed to eliminate the modal distinction to better fuse RGB and depth information. Jaritz et al [15] demonstrated that the middle fusion performed better than the early fusion strategy.…”
Section: Introductionmentioning
confidence: 99%