2019 International Conference on 3D Vision (3DV) 2019
DOI: 10.1109/3dv.2019.00010
|View full text |Cite
|
Sign up to set email alerts
|

MVS2: Deep Unsupervised Multi-View Stereo with Multi-View Symmetry

Abstract: The success of existing deep-learning based multi-view stereo (MVS) approaches greatly depends on the availability of large-scale supervision in the form of dense depth maps. Such supervision, while not always possible, tends to hinder the generalization ability of the learned models in never-seen-before scenarios. In this paper, we propose the first unsupervised learning based MVS network, which learns the multi-view depth maps from the input multi-view images and does not need ground-truth 3D training data. … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
76
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
3
2
2

Relationship

0
7

Authors

Journals

citations
Cited by 74 publications
(76 citation statements)
references
References 24 publications
0
76
0
Order By: Relevance
“…Moreover, due to the unideal run-time and memory requirements, the cascade pyramid structure [2] is proposed to build cost volume and infer depth in coarse to fine, which greatly reduces run-time and memory consumption. Besides, some unsupervised methods [7,3] are proposed to overcome the difficulty of obtaining ground-truth depth maps. These methods utilize photometric consistency and multi-view consistency to guide the training of the network.…”
Section: Related Workmentioning
confidence: 99%
See 3 more Smart Citations
“…Moreover, due to the unideal run-time and memory requirements, the cascade pyramid structure [2] is proposed to build cost volume and infer depth in coarse to fine, which greatly reduces run-time and memory consumption. Besides, some unsupervised methods [7,3] are proposed to overcome the difficulty of obtaining ground-truth depth maps. These methods utilize photometric consistency and multi-view consistency to guide the training of the network.…”
Section: Related Workmentioning
confidence: 99%
“…Our method consists of three modules: multi-view preprocessing, an adaptive encoder-decoder network, postprocessing with shared occlusion masks, as shown in Figure 2. Inspired by present learning-based multi-view methods, such as [1,3], which use front-to-parallel planes at different depth as hypothesis planes, the first step of our pre-processing is to adopt homography to warp source images and then construct cost volumes, as shown in Figure 2. Given input sequences in the form of image-pose pairs, each of which contains a reference image I r for depth estimation, several additional source images {I i } N i=1 , and camera parameters.…”
Section: Network Architecturementioning
confidence: 99%
See 2 more Smart Citations
“…While it is possible to train the network using this synthetic data, for successfully deploying the model in real scenes, we still require to fine-tune the model using data from the target domain [16]. Another alternative is adopting an unsupervised learning strategy [6,13]. In this case, the few existing unsupervised MVS approaches use an image reconstruction loss to supervise the training process.…”
Section: Introductionmentioning
confidence: 99%