2021 IEEE International Conference on Robotics and Automation (ICRA) 2021
DOI: 10.1109/icra48506.2021.9561441
|View full text |Cite
|
Sign up to set email alerts
|

Multimodal Scale Consistency and Awareness for Monocular Self-Supervised Depth Estimation

Abstract: Dense depth estimation is essential to sceneunderstanding for autonomous driving. However, recent selfsupervised approaches on monocular videos suffer from scaleinconsistency across long sequences. Utilizing data from the ubiquitously copresent global positioning systems (GPS), we tackle this challenge by proposing a dynamically-weighted GPS-to-Scale (g2s) loss to complement the appearance-based losses. We emphasize that the GPS is needed only during the multimodal training, and not at inference. The relative … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
9
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
2
1

Relationship

1
6

Authors

Journals

citations
Cited by 21 publications
(9 citation statements)
references
References 39 publications
0
9
0
Order By: Relevance
“…Of note is that DynaDepth also achieves a nearly perfect absolute scale. In terms of scale-awareness, even our R18 version outperforms G2S R50 [3], which uses a heavier encoder. For better illustration, we also show the scaling ratio histograms with and without IMU in Fig.…”
Section: Scale-aware Depth Estimation On Kittimentioning
confidence: 94%
See 2 more Smart Citations
“…Of note is that DynaDepth also achieves a nearly perfect absolute scale. In terms of scale-awareness, even our R18 version outperforms G2S R50 [3], which uses a heavier encoder. For better illustration, we also show the scaling ratio histograms with and without IMU in Fig.…”
Section: Scale-aware Depth Estimation On Kittimentioning
confidence: 94%
“…However, the absolute scale is not guaranteed in these methods. Similar to DynaDepth, there exist methods that resort to other sensors than monocular camera, such as stereo camera that allows a scale-aware left-right consistency loss [13,14,46], and GPS that provides velocities to constrain the ego-motion network [15,3]. In comparison with these methods, using IMU is beneficial in that (1) IMU provides better generalizability since it does suffer from the visual domain gap, and (2) unlike GPS that cannot be used indoors and cameras that fail in texture-less, dynamic and illumination changing scenes, IMU is more robust to the environments.…”
Section: Scale-aware Depth Learningmentioning
confidence: 99%
See 1 more Smart Citation
“…Online Benchmark: We also measure the performance of MT-SfMLearner on the KITTI Online Benchmark for depth prediction 1 using the metrics from (Uhrig et al, 2017). We train on an image size of 1024 × 320, and add the G2S loss (Chawla et al, 2021) for obtaining predictions at metric scale. Results ordered by their rank are shown in Table 4.…”
Section: Depth Estimation Performancementioning
confidence: 99%
“…However, supervised methods require extensive RGB-D ground truth collected from costly LiDARs or multi-camera rigs. Instead, self-supervised methods have increasingly utilized concepts of Structure from Motion (SfM) with known camera intrinsics to train monocular depth and ego-motion estimation networks simultaneously (Guizilini et al, 2020;Lyu et al, 2020;Chawla et al, 2021). While transformer ingredients such as attention have been utilized for self-supervised depth estimation (Johnston and Carneiro, 2020), most methods are nevertheless limited to the use of CNNs that have localized linear operations and lose feature resolution during downsampling to increase their limited receptive field (Yang et al, 2021).…”
Section: Introductionmentioning
confidence: 99%