2019
DOI: 10.1109/lra.2019.2896963
|View full text |Cite
|
Sign up to set email alerts
|

Geo-Supervised Visual Depth Prediction

Abstract: We propose using global orientation from inertial measurements, and the bias it induces on the shape of objects populating the scene, to inform visual 3D reconstruction. We test the effect of using the resulting prior in depth prediction from a single image, where the normal vectors to surfaces of objects of certain classes tend to align with gravity or be orthogonal to it. Adding such a prior to baseline methods for monocular depth prediction yields improvements beyond the state-of-the-art and illustrates the… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
50
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
4
3
2

Relationship

2
7

Authors

Journals

citations
Cited by 61 publications
(50 citation statements)
references
References 29 publications
0
50
0
Order By: Relevance
“…Our VGG11 and VGG8 architectures following the late fusion paradigm [14,28], and our auxiliary pose network to predict relative pose between two frames for constructing our photometric and pose consistency loss (Eqn. 6,8). Our auxiliary pose network is used only in training and not inference.…”
Section: Late Fusion Vgg11mentioning
confidence: 99%
See 1 more Smart Citation
“…Our VGG11 and VGG8 architectures following the late fusion paradigm [14,28], and our auxiliary pose network to predict relative pose between two frames for constructing our photometric and pose consistency loss (Eqn. 6,8). Our auxiliary pose network is used only in training and not inference.…”
Section: Late Fusion Vgg11mentioning
confidence: 99%
“…Our auxiliary pose network contains ≈ 1M parameters and is only used during training to construct the photometric and pose consistency loss (Eqn. 6,8). The output is averaged along its width and height dimensions to result in a 6 element vector -of which 3 elements are used to compose rotation (Sec.…”
Section: Appendix a Void Datasetmentioning
confidence: 99%
“…Since Laina [40] constructed a fully convolutional architecture to predict the depth map, following works [41], [42] benefit from the increasing ability of FCN and achieve promising results. Besides, Fei [43] proposed a semantically informed geometric loss while Wei [44] uses a virtual normal loss to constraint the structure information. Like in semantic segmentation, there are also some works try to replace the encoder with efficiency backbones [13], [44], [45] to decrease the computational cost, but suffer from the training problem limited by the ability of the compact network.…”
Section: Depth Estimationmentioning
confidence: 99%
“…Building on these observations, we choose to learn the easier tasks first, then use them as guidance to the more complex task of depth estimation through our novel consensus loss terms of Eqs. (7) and (6). See supplementary material for more details on the training procedure.…”
Section: Implementation Detailsmentioning
confidence: 99%