2019 IEEE/CVF International Conference on Computer Vision (ICCV) 2019
DOI: 10.1109/iccv.2019.00871
|View full text |Cite
|
Sign up to set email alerts
|

Moving Indoor: Unsupervised Video Depth Learning in Challenging Environments

Abstract: Recently unsupervised learning of depth from videos has made remarkable progress and the results are comparable to fully supervised methods in outdoor scenes like KITTI. However, there still exist great challenges when directly applying this technology in indoor environments, e.g., large areas of non-texture regions like white wall, more complex ego-motion of handheld camera, transparent glasses and shiny objects. To overcome these problems, we propose a new optical-flow based training paradigm which reduces t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
47
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
3
2
2

Relationship

0
7

Authors

Journals

citations
Cited by 57 publications
(56 citation statements)
references
References 43 publications
1
47
0
Order By: Relevance
“…(i) SFMLearner (and similarly, GeoNet, etc.) results were shown on the relatively simple KITTI dataset, but work poorly on more complex data [78] because its spatial representation is a 2.5D depth map. We use the predicted depth and camera transformation to warp the first frame into the target frame.…”
Section: Dosovitsky Et Almentioning
confidence: 99%
“…(i) SFMLearner (and similarly, GeoNet, etc.) results were shown on the relatively simple KITTI dataset, but work poorly on more complex data [78] because its spatial representation is a 2.5D depth map. We use the predicted depth and camera transformation to warp the first frame into the target frame.…”
Section: Dosovitsky Et Almentioning
confidence: 99%
“…the indoor scenario, and only a few attempts have been made. As pointed out by the pioneer work [59], indoor videos, such as the NYU Depth V2 dataset [43], have complicated ego-motion, as they are usually recorded by handheld cameras. The problem can be alleviated by sampling the more distant (±10) frames as the source frames [59] or weakly rectifying the training sequences [3].…”
Section: Imagementioning
confidence: 99%
“…As pointed out by the pioneer work [59], indoor videos, such as the NYU Depth V2 dataset [43], have complicated ego-motion, as they are usually recorded by handheld cameras. The problem can be alleviated by sampling the more distant (±10) frames as the source frames [59] or weakly rectifying the training sequences [3]. Alternatively, we could construct a dataset by moving the camera steadily and sufficiently to solve the problem.…”
Section: Imagementioning
confidence: 99%
See 2 more Smart Citations