Unsupervised Learning of Monocular Depth Estimation and Visual Odometry with Deep Feature Reconstruction

Zhan, Huangying; Garg, Ruchi; Weerasekera, Chamara Saroj; Li, Kejie; Agarwal, Harsh; Reid, Ian

doi:10.1109/cvpr.2018.00043

Cited by 622 publications

(499 citation statements)

References 41 publications

(98 reference statements)

Supporting

Mentioning

471

Contrasting

Unclassified

Order By: Relevance

“…However, learning to estimate depth purely from video breaks the static scene assumption and necessitates the use of an attention mechanism for foreground motion between consecutive frames. More recent iterations of this direction added scale normalization and removal of the separate pose estimation branch [40], 3D geometric constraints between the pre-dicted depths [23], epipolar constraints [29], additional feature reconstruction supervision [44], stereo matching constraints [43] or explicitly used two consecutive frames as input [28].…”

Section: Related Workmentioning

confidence: 99%

Spherical View Synthesis for Self-Supervised 360° Depth Estimation

Zioulis

Karakottas

Zarpalas

et al. 2019

2019 International Conference on 3D Vision (3DV)

View full text Add to dashboard Cite

Learning based approaches for depth perception are limited by the availability of clean training data. This has led to the utilization of view synthesis as an indirect objective for learning depth estimation using efficient data acquisition procedures. Nonetheless, most research focuses on pinhole based monocular vision, with scarce works presenting results for omnidirectional input. In this work, we explore spherical view synthesis for learning monocular 360 o depth in a self-supervised manner and demonstrate its feasibility. Under a purely geometrically derived formulation we present results for horizontal and vertical baselines, as well as for the trinocular case. Further, we show how to better exploit the expressiveness of traditional CNNs when applied to the equirectangular domain in an efficient manner. Finally, given the availability of ground truth depth data, our work is uniquely positioned to compare view synthesis against direct supervision in a consistent and fair manner. The results indicate that alternative research directions might be better suited to enable higher quality depth perception. Our data, models and code are publicly available at https

show abstract

Section: Related Workmentioning

confidence: 99%

Spherical View Synthesis for Self-Supervised 360° Depth Estimation

Zioulis

Karakottas

Zarpalas

et al. 2019

2019 International Conference on 3D Vision (3DV)

View full text Add to dashboard Cite

show abstract

“…Depth estimation from a single image has gained increasing attention in the computer vision community. Most works like [37,38,20,15,39,41,9,16] are proposed for indoor and outdoor scenes. We focus on depth estimation of humans, which allows us to build much stronger shape prior than these generic depth estimation methods.…”

Section: Related Workmentioning

confidence: 99%

A Neural Network for Detailed Human Depth Estimation From a Single Image

Tang

Tan

Cheng

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

View full text Add to dashboard Cite

This paper presents a neural network to estimate a detailed depth map of the foreground human in a single RGB image. The result captures geometry details such as cloth wrinkles, which are important in visualization applications. To achieve this goal, we separate the depth map into a smooth base shape and a residual detail shape and design a network with two branches to regress them respectively. We design a training strategy to ensure both base and detail shapes can be faithfully learned by the corresponding network branches. Furthermore, we introduce a novel network layer to fuse a rough depth map and surface normals to further improve the final result. Quantitative comparison with fused 'ground truth' captured by real depth cameras and qualitative examples on unconstrained Internet images demonstrate the strength of the proposed method. Our code will be released at Link

show abstract

“…Leveraging temporal stereo sequences for unsupervised monocular depth and pose estimation, e.g. by warping deep features, improves the accuracy of both tasks [55]. With the same result, Zou et al [60] jointly train for optical flow, pose and depth estimation simultaneously while Jiao et al [23] mutually improve semantics and depth and GeoNet [53] jointly estimates depth, optical flow and camera pose from video.…”

Section: Monocular Visionmentioning

confidence: 96%

SteReFo: Efficient Image Refocusing with Stereo Vision

Busam

Hog

McDonagh

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW)

View full text Add to dashboard Cite

Whether to attract viewer attention to a particular object, give the impression of depth or simply reproduce humanlike scene perception, shallow depth of field images are used extensively by professional and amateur photographers alike. To this end, high quality optical systems are used in DSLR cameras to focus on a specific depth plane while producing visually pleasing bokeh. We propose a physically motivated pipeline to mimic this effect from all-in-focus stereo images, typically retrieved by mobile cameras. It is capable to change the focal plane a posteriori at 76 FPS on KITTI [13] images to enable realtime applications. As our portmanteau suggests, SteReFo interrelates stereo-based depth estimation and refocusing efficiently. In contrast to other approaches, our pipeline is simultaneously fully differentiable, physically motivated, and agnostic to scene content. It also enables computational video focus tracking for moving objects in addition to refocusing of static images. We evaluate our approach on publicly available datasets [13,33,9] and quantify the quality of architectural changes.

show abstract

Unsupervised Learning of Monocular Depth Estimation and Visual Odometry with Deep Feature Reconstruction

Cited by 622 publications

References 41 publications

Spherical View Synthesis for Self-Supervised 360° Depth Estimation

Spherical View Synthesis for Self-Supervised 360° Depth Estimation

A Neural Network for Detailed Human Depth Estimation From a Single Image

SteReFo: Efficient Image Refocusing with Stereo Vision

Contact Info

Product

Resources

About