Moving Indoor: Unsupervised Video Depth Learning in Challenging Environments

Zhou, Junsheng; Wang, Yuwang; Qin, Kaihuai; Zeng, Wenjun

doi:10.1109/iccv.2019.00871

Cited by 57 publications

(56 citation statements)

References 43 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…(i) SFMLearner (and similarly, GeoNet, etc.) results were shown on the relatively simple KITTI dataset, but work poorly on more complex data [78] because its spatial representation is a 2.5D depth map. We use the predicted depth and camera transformation to warp the first frame into the target frame.…”

Section: Dosovitsky Et Almentioning

confidence: 99%

Video Autoencoder: self-supervised disentanglement of static 3D structure and motion

Lai

Liu

Efros

et al. 2021

Preprint

View full text Add to dashboard Cite

Videostructure + motion … …Figure 1: Video Autoencoder: raw input video is automatically disentangled into 3D scene structure and camera trajectory. To reconstruct the original video, the camera transformation is applied to the 3D structure feature and then decoded back to pixels. Without any fine-tuning, the model generalizes to unseen videos and enables tasks such as novel view synthesis, pose estimation, and "video following".

show abstract

Section: Dosovitsky Et Almentioning

confidence: 99%

Video Autoencoder: self-supervised disentanglement of static 3D structure and motion

Lai

Liu

Efros

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…the indoor scenario, and only a few attempts have been made. As pointed out by the pioneer work [59], indoor videos, such as the NYU Depth V2 dataset [43], have complicated ego-motion, as they are usually recorded by handheld cameras. The problem can be alleviated by sampling the more distant (±10) frames as the source frames [59] or weakly rectifying the training sequences [3].…”

Section: Imagementioning

confidence: 99%

“…As pointed out by the pioneer work [59], indoor videos, such as the NYU Depth V2 dataset [43], have complicated ego-motion, as they are usually recorded by handheld cameras. The problem can be alleviated by sampling the more distant (±10) frames as the source frames [59] or weakly rectifying the training sequences [3]. Alternatively, we could construct a dataset by moving the camera steadily and sufficiently to solve the problem.…”

Section: Imagementioning

confidence: 99%

“…Adequate research has been done for this difficulty [53,2,4,38,18,31,27,28,33]. However, only a few works began to apply the unsupervised framework to the more challenging indoor scene [59,57,3,54]. To learn the correspondence well for the textureless regions, Zhou et al [59] proposed to predict optical flow in a sparseto-dense propagation manner and use the optical flow to guide the rigid flow generated by the unsupervised depth learning framework.…”

Section: Related Workmentioning

confidence: 99%

“…However, only a few works began to apply the unsupervised framework to the more challenging indoor scene [59,57,3,54]. To learn the correspondence well for the textureless regions, Zhou et al [59] proposed to predict optical flow in a sparseto-dense propagation manner and use the optical flow to guide the rigid flow generated by the unsupervised depth learning framework. To learn the ego-motion better, Train-Flow [57] estimates the pose by reliable correspondences from a FlowNet with a differentiable two-view triangulation module.…”

Section: Related Workmentioning

confidence: 99%

See 2 more Smart Citations

PLNet: Plane and Line Priors for Unsupervised Indoor Depth Estimation

Jiang¹,

Ding²,

Hu³

et al. 2021

Preprint

View full text Add to dashboard Cite

Unsupervised learning of depth from indoor monocular videos is challenging as the artificial environment contains many textureless regions. Fortunately, the indoor scenes are full of specific structures, such as planes and lines, which should help guide unsupervised depth learning. This paper proposes PLNet that leverages the plane and line priors to enhance the depth estimation. We first represent the scene geometry using local planar coefficients and impose the smoothness constraint on the representation. Moreover, we enforce the planar and linear consistency by randomly selecting some sets of points that are probably coplanar or collinear to construct simple and effective consistency losses. To verify the proposed method's effectiveness, we further propose to evaluate the flatness and straightness of the predicted point cloud on the reliable planar and linear regions. The regularity of these regions indicates quality indoor reconstruction. Experiments on NYU Depth V2 and ScanNet show that PLNet outperforms existing methods. The code is available at https://github.com/HalleyJiang/PLNet.

show abstract

P$$^{2}$$Net: Patch-Match and Plane-Regularization for Unsupervised Indoor Depth Estimation

Jin

Gao

2020

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Moving Indoor: Unsupervised Video Depth Learning in Challenging Environments

Cited by 57 publications

References 43 publications

Video Autoencoder: self-supervised disentanglement of static 3D structure and motion

Video Autoencoder: self-supervised disentanglement of static 3D structure and motion

PLNet: Plane and Line Priors for Unsupervised Indoor Depth Estimation

P$$^{2}$$Net: Patch-Match and Plane-Regularization for Unsupervised Indoor Depth Estimation

Contact Info

Product

Resources

About