Self-supervised Multi-view Stereo via Effective Co-Segmentation and Data-Augmentation

Xu, Hongbin; Zhou, Zhipeng; Qiao, Yu; Kang, Wenxiong; Wu, Qiuxia

doi:10.1609/aaai.v35i4.16411

Cited by 50 publications

(26 citation statements)

References 33 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…1, DS-MVSNet achieves the best accuracy, completeness, and overall score among coarse-to-fine supervised methods. Especially for completeness, the metric is improved by 10% compared with U-MVSNet-MS [30]. Fig.…”

Section: Evaluation On Dtu Datasetmentioning

confidence: 94%

“…However, these methods cannot be trained in an end-to-end manner. JDACS [30] proposed an end-to-end network, supervised by photometric consistency, segmentation map and augmentation data. However, it requires a pretrained feature extraction backbone for segmentation and inferring two times due to data augmentation.…”

Section: Unsupervised Learning-based Mvsmentioning

confidence: 99%

“…Although these methods achieve superior performance, they cannot be trained in an end-to-end manner. Furthermore, some methods utilize additional inputs besides the multi-images, such as pre-processed optical flow [31], augmented data which infer two times [30] in training, and the pre-trained image semantic segmentation backbone [30]. These methods make the training more complicated and need more additional information.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

DS-MVSNet: Unsupervised Multi-view Stereo via Depth Synthesis

Li¹,

Lu²,

Wang³

et al. 2022

Preprint

View full text Add to dashboard Cite

In recent years, supervised or unsupervised learning-based MVS methods achieved excellent performance compared with traditional methods. However, these methods only use the probability volume computed by cost volume regularization to predict reference depths and this manner cannot mine enough information from the probability volume. Furthermore, the unsupervised methods usually try to use two-step or additional inputs for training which make the procedure more complicated. In this paper, we propose the DS-MVSNet, an end-to-end unsupervised MVS structure with the source depths synthesis. To mine the information in probability volume, we creatively synthesize the source depths by splattering the probability volume and depth hypotheses to source views. Meanwhile, we propose the adaptive Gaussian sampling and improved adaptive bins sampling approach that improve the depths hypotheses accuracy. On the other hand, we utilize the source depths to render the reference images and propose depth consistency loss and depth smoothness loss. These can provide additional guidance according to photometric and geometric consistency in different views without additional inputs. Finally, we conduct a series of experiments on the DTU dataset and Tanks & Temples dataset that demonstrate the efficiency and robustness of our DS-MVSNet compared with the state-of-the-art methods. CCS CONCEPTS• Computing methodologies → Reconstruction.

show abstract

Section: Evaluation On Dtu Datasetmentioning

confidence: 94%

Section: Unsupervised Learning-based Mvsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

DS-MVSNet: Unsupervised Multi-view Stereo via Depth Synthesis

Li¹,

Lu²,

Wang³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…The accuracy and completeness of the network reconstruction point cloud and the generalization ability of the network are better than most 3D reconstruction methods of the same period. Therefore, this method is widely used for depth estimation in most deep learning MVS networks [ 15 , 16 , 17 , 18 , 19 , 20 ]. However, to ensure the accuracy of depth calculation, the storage requirement is three times that of image resolution.…”

Section: Introductionmentioning

confidence: 99%

“…In this architecture, multi-scale pyramid feature aggregation is used to construct a 3D cost volume with more context information, and the loss function combines pixel loss and feature loss. In 2021, Xu et al [ 20 ] combined data augmentation and semantic segmentation as self-supervised signals, making the reconstruction effect comparable to that of the most advanced supervised learning networks. Yang et al [ 27 ] comprehensively used various methods such as deep fusion, mesh generation and deep rendering in unsupervised networks to optimize the pseudo depth.…”

Section: Introductionmentioning

confidence: 99%

Unsupervised 3D Reconstruction with Multi-Measure and High-Resolution Loss

Zheng

Luo

Chen

et al. 2022

Sensors

View full text Add to dashboard Cite

Multi-view 3D reconstruction technology based on deep learning is developing rapidly. Unsupervised learning has become a research hotspot because it does not need ground truth labels. The current unsupervised method mainly uses 3DCNN to regularize the cost volume to regression image depth. This approach results in high memory requirements and long computing time. In this paper, we propose an end-to-end unsupervised multi-view 3D reconstruction network framework based on PatchMatch, Unsup_patchmatchnet. It dramatically reduces memory requirements and computing time. We propose a feature point consistency loss function. We incorporate various self-supervised signals such as photometric consistency loss and semantic consistency loss into the loss function. At the same time, we propose a high-resolution loss method. This improves the reconstruction of high-resolution images. The experiment proves that the memory usage of the network is reduced by 80% and the running time is reduced by more than 50% compared with the network using 3DCNN method. The overall error of reconstructed 3D point cloud is only 0.501 mm. It is superior to most current unsupervised multi-view 3D reconstruction networks. Then, we test on different data sets and verify that the network has good generalization.

show abstract