2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition 2018
DOI: 10.1109/cvpr.2018.00594
|View full text |Cite
|
Sign up to set email alerts
|

Unsupervised Learning of Depth and Ego-Motion from Monocular Video Using 3D Geometric Constraints

Abstract: We present a novel approach for unsupervised learning of depth and ego-motion from monocular video. Unsupervised learning removes the need for separate supervisory signals (depth or ego-motion ground truth, or multi-view video). Prior work in unsupervised depth learning uses pixel-wise or gradient-based losses, which only consider pixels in small local neighborhoods. Our main contribution is to explicitly consider the inferred 3D geometry of the whole scene, and enforce consistency of the estimated 3D point cl… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

3
638
0
2

Year Published

2018
2018
2019
2019

Publication Types

Select...
5
5

Relationship

0
10

Authors

Journals

citations
Cited by 719 publications
(666 citation statements)
references
References 29 publications
3
638
0
2
Order By: Relevance
“…However, learning to estimate depth purely from video breaks the static scene assumption and necessitates the use of an attention mechanism for foreground motion between consecutive frames. More recent iterations of this direction added scale normalization and removal of the separate pose estimation branch [40], 3D geometric constraints between the pre-dicted depths [23], epipolar constraints [29], additional feature reconstruction supervision [44], stereo matching constraints [43] or explicitly used two consecutive frames as input [28].…”
Section: Related Workmentioning
confidence: 99%
“…However, learning to estimate depth purely from video breaks the static scene assumption and necessitates the use of an attention mechanism for foreground motion between consecutive frames. More recent iterations of this direction added scale normalization and removal of the separate pose estimation branch [40], 3D geometric constraints between the pre-dicted depths [23], epipolar constraints [29], additional feature reconstruction supervision [44], stereo matching constraints [43] or explicitly used two consecutive frames as input [28].…”
Section: Related Workmentioning
confidence: 99%
“…Depth estimation from a single image has gained increasing attention in the computer vision community. Most works like [37,38,20,15,39,41,9,16] are proposed for indoor and outdoor scenes. We focus on depth estimation of humans, which allows us to build much stronger shape prior than these generic depth estimation methods.…”
Section: Related Workmentioning
confidence: 99%
“…There are also some approaches proposed to get rid of the PoseNet. For instance, Mahjourian et al [32] used Iterative Closest Point (ICP) [2,4,36] to compute a transformation that minimizes point-to-point distances between corresponding points. Wang et al [42] used direct visual odometry (DVO) [40] to obtain camera pose from predicted depth and images.…”
Section: Posenetmentioning
confidence: 99%