We present an end-to-end deep learning architecture for depth map inference from multi-view images. In the network, we first extract deep visual image features, and then build the 3D cost volume upon the reference camera frustum via the differentiable homography warping. Next, we apply 3D convolutions to regularize and regress the initial depth map, which is then refined with the reference image to generate the final output. Our framework flexibly adapts arbitrary N-view inputs using a variance-based cost metric that maps multiple features into one cost feature. The proposed MVSNet is demonstrated on the large-scale indoor DTU dataset. With simple post-processing, our method not only significantly outperforms previous state-of-the-arts, but also is several times faster in runtime. We also evaluate MVSNet on the complex outdoor Tanks and Temples dataset, where our method ranks first before April 18, 2018 without any fine-tuning, showing the strong generalization ability of MVSNet.
Deep learning has recently demonstrated its excellent performance for multi-view stereo (MVS). However, one major limitation of current learned MVS approaches is the scalability: the memory-consuming cost volume regularization makes the learned MVS hard to be applied to highresolution scenes. In this paper, we introduce a scalable multi-view stereo framework based on the recurrent neural network. Instead of regularizing the entire 3D cost volume in one go, the proposed Recurrent Multi-view Stereo Network (R-MVSNet) sequentially regularizes the 2D cost maps along the depth direction via the gated recurrent unit (GRU). This reduces dramatically the memory consumption and makes high-resolution reconstruction feasible. We first show the state-of-the-art performance achieved by the proposed R-MVSNet on the recent MVS benchmarks. Then, we further demonstrate the scalability of the proposed method on several large-scale scenarios, where previous learned approaches often fail due to the memory constraint. Code is available at https://github.com/ YoYo000/MVSNet.
Abstract-This paper proposes a quasi-dense approach to 3D surface model acquisition from uncalibrated images. First, correspondence information and geometry are computed based on new quasi-dense point features that are resampled subpixel points from a disparity map. The quasi-dense approach gives more robust and accurate geometry estimations than the standard sparse approach. The robustness is measured as the success rate of full automatic geometry estimation with all involved parameters fixed. The accuracy is measured by a fast gauge-free uncertainty estimation algorithm. The quasi-dense approach also works for more largely separated images than the sparse approach, therefore, it requires fewer images for modeling. More importantly, the quasidense approach delivers a high density of reconstructed 3D points on which a surface representation can be reconstructed. This fills the gap of insufficiency of the sparse approach for surface reconstruction, essential for modeling and visualization applications. Second, surface reconstruction methods from the given quasi-dense geometry are also developed. The algorithm optimizes new unified functionals integrating both 3D quasi-dense points and 2D image information, including silhouettes. Combining both 3D data and 2D images is more robust than the existing methods using only 2D information or only 3D data. An efficient bounded regularization method is proposed to implement the surface evolution by level-set methods. Its properties are discussed and proven for some cases. As a whole, a complete automatic and practical system of 3D modeling from raw images captured by hand-held cameras to surface representation is proposed. Extensive experiments demonstrate the superior performance of the quasi-dense approach with respect to the standard sparse approach in robustness, accuracy, and applicability.
In this paper, we propose a semi-automatic technique for modeling plants directly from images. Our image-based approach has the distinct advantage that the resulting model inherits the realistic shape and complexity of a real plant. We designed our modeling system to be interactive, automating the process of shape recovery while relying on the user to provide simple hints on segmentation. Segmentation is performed in both image and 3D spaces, allowing the user to easily visualize its effect immediately. Using the segmented image and 3D data, the geometry of each leaf is then automatically recovered from the multiple views by fitting a deformable leaf model. Our system also allows the user to easily reconstruct branches in a similar manner. We show realistic reconstructions of a variety of plants, and demonstrate examples of plant editing.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.