Deep Neural Networks (DNNs) have the potential to improve the quality of image-based 3D reconstructions. A challenge which still remains is to utilize the potential of DNNs to improve 3D reconstructions from high-resolution image datasets as available by the ETH3D benchmark. In this paper, we propose a way to employ DNNs in the image domain to gain a significant quality improvement of geometric image based 3D reconstruction. This is achieved by utilizing confidence prediction networks which have been adapted to the Multi-View Stereo (MVS) case and are trained on automatically generated ground truth established by geometric error propagation. In addition to a semi-dense real-world ground truth dataset for training the DNN, we present a synthetic dataset to enlarge the training dataset. We demonstrate the utility of the confidence predictions for two essential steps within a 3D reconstruction pipeline: Firstly, to be used for outlier clustering and filtering and secondly to be used within a depth refinement step. The presented 3D reconstruction pipeline DeepC-MVS makes use of deep learning for an essential part in MVS from high-resolution images and the experimental evaluation on popular benchmarks demonstrates the achieved state-of-the-art quality in 3D reconstruction.
We present a novel deep-learning-based method for Multi-View Stereo. Our method estimates high resolution and highly precise depth maps iteratively, by traversing the continuous space of feasible depth values at each pixel in a binary decision fashion. The decision process leverages a deep-network architecture: this computes a pixelwise binary mask that establishes whether each pixel actual depth is in front or behind its current iteration individual depth hypothesis. Moreover, in order to handle occluded regions, at each iteration the results from different source images are fused using pixelwise weights estimated by a second network. Thanks to the adopted binary decision strategy, which permits an efficient exploration of the depth space, our method can handle high resolution images without trading resolution and precision. This sets it apart from most alternative learning-based Multi-View Stereo methods, where the explicit discretization of the depth space requires the processing of large cost volumes. We compare our method with state-of-the-art Multi-View Stereo methods on the DTU, Tanks and Temples and the challenging ETH3D benchmarks and show competitive results.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.