Recurrent MVSNet for High-Resolution Multi-View Stereo Depth Inference

Yao, Yao; Luo, Zixin; Li, Shiwei; Shen, Tianwei; Fang, Tian; Quan, Long

doi:10.1109/cvpr.2019.00567

Cited by 434 publications

(441 citation statements)

References 29 publications

Supporting

Mentioning

441

Contrasting

Order By: Relevance

“…Im et al [24] applied a plane sweeping approach to build a cost volume from deep features, then regularized the cost volume via a context-aware aggregation to improve depth regression. Very recently, Yao et al [31] introduced a scalable MVS framework based on the recurrent neural network to reduce the memory-consuming. Unsupervised Geometric Learning: Unsupervised learning has been developed in monocular depth estimation and binocular stereo matching by exploiting the photometric consistency and regularization.…”

Section: Related Workmentioning

confidence: 99%

“…Recently, the success of deep convolutional neural networks (CNNs) in monocular depth estimation [18,9,19] and binocular depth estimation [33,34] has been extended to MVS. Existing deep CNNs based MVS approaches [30,31,11,24] tend to represent MVS as an end-to-end regression problem. By exploiting large-scale ground truth 3D training data, these methods outperform traditional geometrybased approaches and dominate the leading boards on different benchmarking datasets [30,31].…”

Section: Introductionmentioning

confidence: 99%

“…Existing deep CNNs based MVS approaches [30,31,11,24] tend to represent MVS as an end-to-end regression problem. By exploiting large-scale ground truth 3D training data, these methods outperform traditional geometrybased approaches and dominate the leading boards on different benchmarking datasets [30,31]. However, the success of these supervised MVS approaches strongly depends on the availability of large-scale ground-truth 3D training data, which not only not always available but also may further hinder their generalization ability in never-seen-before open-world scenarios [34].…”

Section: Introductionmentioning

confidence: 99%

“…Our network structure differs from existing MVS and simple extension of unsupervised binocular stereo matching in the following aspects: a) Our network is symmetric to all the views, i.e., it treats each view equivalently and predicts the depth map for each view simultaneously. Existing supervised learning based MVS methods [30,31,11,27] apply an "asymmetric" design and infer depth map for the reference image only. Thus, multiple depth maps estimated from different viewpoints do not comply with the same 3D geometry and 3D point clouds processing is required to derive a consistent 3D geometry.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

MVS2: Deep Unsupervised Multi-View Stereo with Multi-View Symmetry

Dai

Zhang

Rao

et al. 2019

2019 International Conference on 3D Vision (3DV)

View full text Add to dashboard Cite

The success of existing deep-learning based multi-view stereo (MVS) approaches greatly depends on the availability of large-scale supervision in the form of dense depth maps. Such supervision, while not always possible, tends to hinder the generalization ability of the learned models in never-seen-before scenarios. In this paper, we propose the first unsupervised learning based MVS network, which learns the multi-view depth maps from the input multi-view images and does not need ground-truth 3D training data. Our network is symmetric in predicting depth maps for all views simultaneously, where we enforce cross-view consistency of multi-view depth maps during both training and testing stages. Thus, the learned multi-view depth maps naturally comply with the underlying 3D scene geometry. Besides, our network also learns the multi-view occlusion maps, which further improves the robustness of our network in handling real-world occlusions. Experimental results on multiple benchmarking datasets demonstrate the effectiveness of our network and the excellent generalization ability.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

MVS2: Deep Unsupervised Multi-View Stereo with Multi-View Symmetry

Dai

Zhang

Rao

et al. 2019

2019 International Conference on 3D Vision (3DV)

View full text Add to dashboard Cite

show abstract

“…Given multiple images with known camera poses and intrinsic calibration, DeepMVS [10] generates cost volumes using learned feature maps and then estimates the disparity map by fusing multiple cost volumes. MVDepthNet [11], DPSNet [12] and MVSNet [13], [14] solve the same reconstruction problem but differ in the calculation of cost volumes and the structure of networks. On the other hand, given an RGB-D keyframe, DeepTAM [15] incrementally tracks the pose of a camera using synthetic viewpoints and can further estimate the depth map of the tracked frame.…”

Section: Related Workmentioning

confidence: 99%

Flow-Motion and Depth Network for Monocular Stereo and Beyond

Wang

Shen

2020

IEEE Robot. Autom. Lett.

View full text Add to dashboard Cite

We propose a learning-based method 1 that solves monocular stereo and can be extended to fuse depth information from multiple target frames. Given two unconstrained images from a monocular camera with known intrinsic calibration, our network estimates relative camera poses and the depth map of the source image. The core contribution of the proposed method is threefold. First, a network is tailored for static scenes that jointly estimates the optical flow and camera motion. By the joint estimation, the optical flow search space is gradually reduced resulting in an efficient and accurate flow estimation. Second, a novel triangulation layer is proposed to encode the estimated optical flow and camera motion while avoiding common numerical issues caused by epipolar. Third, beyond two-view depth estimation, we further extend the above networks to fuse depth information from multiple target images and estimate the depth map of the source image. To further benefit the research community, we introduce tools to generate photorealistic structure-from-motion datasets such that deep networks can be well trained and evaluated. The proposed method is compared with previous methods and achieves stateof-the-art results within less time. Images from real-world applications and Google Earth are used to demonstrate the generalization ability of the method.

show abstract

DeepSFM: Structure from Motion via Deep Bundle Adjustment

Wei

Zhang

et al. 2020

Computer Vision – ECCV 2020

View full text Add to dashboard Cite

Recurrent MVSNet for High-Resolution Multi-View Stereo Depth Inference

Cited by 434 publications

References 29 publications

MVS2: Deep Unsupervised Multi-View Stereo with Multi-View Symmetry

MVS2: Deep Unsupervised Multi-View Stereo with Multi-View Symmetry

Flow-Motion and Depth Network for Monocular Stereo and Beyond

DeepSFM: Structure from Motion via Deep Bundle Adjustment

Contact Info

Product

Resources

About