Unsupervised Learning of Depth and Ego-Motion from Monocular Video Using 3D Geometric Constraints

Mahjourian, Reza; Wicke, Martin; Angelova, Anelia

doi:10.1109/cvpr.2018.00594

Cited by 719 publications

(666 citation statements)

References 29 publications

Supporting

Mentioning

638

Contrasting

Unclassified

Order By: Relevance

“…However, learning to estimate depth purely from video breaks the static scene assumption and necessitates the use of an attention mechanism for foreground motion between consecutive frames. More recent iterations of this direction added scale normalization and removal of the separate pose estimation branch [40], 3D geometric constraints between the pre-dicted depths [23], epipolar constraints [29], additional feature reconstruction supervision [44], stereo matching constraints [43] or explicitly used two consecutive frames as input [28].…”

Section: Related Workmentioning

confidence: 99%

Spherical View Synthesis for Self-Supervised 360° Depth Estimation

Zioulis

Karakottas

Zarpalas

et al. 2019

2019 International Conference on 3D Vision (3DV)

View full text Add to dashboard Cite

Learning based approaches for depth perception are limited by the availability of clean training data. This has led to the utilization of view synthesis as an indirect objective for learning depth estimation using efficient data acquisition procedures. Nonetheless, most research focuses on pinhole based monocular vision, with scarce works presenting results for omnidirectional input. In this work, we explore spherical view synthesis for learning monocular 360 o depth in a self-supervised manner and demonstrate its feasibility. Under a purely geometrically derived formulation we present results for horizontal and vertical baselines, as well as for the trinocular case. Further, we show how to better exploit the expressiveness of traditional CNNs when applied to the equirectangular domain in an efficient manner. Finally, given the availability of ground truth depth data, our work is uniquely positioned to compare view synthesis against direct supervision in a consistent and fair manner. The results indicate that alternative research directions might be better suited to enable higher quality depth perception. Our data, models and code are publicly available at https

show abstract

Section: Related Workmentioning

confidence: 99%

Spherical View Synthesis for Self-Supervised 360° Depth Estimation

Zioulis

Karakottas

Zarpalas

et al. 2019

2019 International Conference on 3D Vision (3DV)

View full text Add to dashboard Cite

show abstract

“…Depth estimation from a single image has gained increasing attention in the computer vision community. Most works like [37,38,20,15,39,41,9,16] are proposed for indoor and outdoor scenes. We focus on depth estimation of humans, which allows us to build much stronger shape prior than these generic depth estimation methods.…”

Section: Related Workmentioning

confidence: 99%

A Neural Network for Detailed Human Depth Estimation From a Single Image

Tang

Tan

Cheng

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

View full text Add to dashboard Cite

This paper presents a neural network to estimate a detailed depth map of the foreground human in a single RGB image. The result captures geometry details such as cloth wrinkles, which are important in visualization applications. To achieve this goal, we separate the depth map into a smooth base shape and a residual detail shape and design a network with two branches to regress them respectively. We design a training strategy to ensure both base and detail shapes can be faithfully learned by the corresponding network branches. Furthermore, we introduce a novel network layer to fuse a rough depth map and surface normals to further improve the final result. Quantitative comparison with fused 'ground truth' captured by real depth cameras and qualitative examples on unconstrained Internet images demonstrate the strength of the proposed method. Our code will be released at Link

show abstract

“…There are also some approaches proposed to get rid of the PoseNet. For instance, Mahjourian et al [32] used Iterative Closest Point (ICP) [2,4,36] to compute a transformation that minimizes point-to-point distances between corresponding points. Wang et al [42] used direct visual odometry (DVO) [40] to obtain camera pose from predicted depth and images.…”

Section: Posenetmentioning

confidence: 99%

Moving Indoor: Unsupervised Video Depth Learning in Challenging Environments

Zhou

Wang

Qin

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

View full text Add to dashboard Cite

Recently unsupervised learning of depth from videos has made remarkable progress and the results are comparable to fully supervised methods in outdoor scenes like KITTI. However, there still exist great challenges when directly applying this technology in indoor environments, e.g., large areas of non-texture regions like white wall, more complex ego-motion of handheld camera, transparent glasses and shiny objects. To overcome these problems, we propose a new optical-flow based training paradigm which reduces the difficulty of unsupervised learning by providing a clearer training target and handles the non-texture regions. Our experimental evaluation demonstrates that the result of our method is comparable to fully supervised methods on the NYU Depth V2 benchmark. To the best of our knowledge, this is the first quantitative result of purely unsupervised learning method reported on indoor datasets.

show abstract

Unsupervised Learning of Depth and Ego-Motion from Monocular Video Using 3D Geometric Constraints

Cited by 719 publications

References 29 publications

Spherical View Synthesis for Self-Supervised 360° Depth Estimation

Spherical View Synthesis for Self-Supervised 360° Depth Estimation

A Neural Network for Detailed Human Depth Estimation From a Single Image

Moving Indoor: Unsupervised Video Depth Learning in Challenging Environments

Contact Info

Product

Resources

About