2022
DOI: 10.3390/s22197659
|View full text |Cite
|
Sign up to set email alerts
|

MVS-T: A Coarse-to-Fine Multi-View Stereo Network with Transformer for Low-Resolution Images 3D Reconstruction

Abstract: A coarse-to-fine multi-view stereo network with Transformer (MVS-T) is proposed to solve the problems of sparse point clouds and low accuracy in reconstructing 3D scenes from low-resolution multi-view images. The network uses a coarse-to-fine strategy to estimate the depth of the image progressively and reconstruct the 3D point cloud. First, pyramids of image features are constructed to transfer the semantic and spatial information among features at different scales. Then, the Transformer module is employed to… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
4
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(4 citation statements)
references
References 37 publications
0
4
0
Order By: Relevance
“…Traditional methods, such as SFM and SLAM, require the sequential input of images to recover a 3D structure by feature extraction and matching, lacking robustness when dealing with separated and sparse viewpoints. Recently, multi-view stereo methods [1][2][3][4] based on deep learning have become crucial in this field. Unlike traditional methods, these deep learning approaches have fewer restrictions on the order and number of input images.…”
Section: Multi-view Reconstruction Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…Traditional methods, such as SFM and SLAM, require the sequential input of images to recover a 3D structure by feature extraction and matching, lacking robustness when dealing with separated and sparse viewpoints. Recently, multi-view stereo methods [1][2][3][4] based on deep learning have become crucial in this field. Unlike traditional methods, these deep learning approaches have fewer restrictions on the order and number of input images.…”
Section: Multi-view Reconstruction Methodsmentioning
confidence: 99%
“…Utilizing multi-view images for reconstruction is an effective approach to solving the problem mentioned above, as it provides additional spatial information about the target object. Traditional methods, such as Structure from Motion (SFM) and Simultaneous Localization and Mapping (SLAM), and some deep learning methods [1][2][3][4] based on multi-view stereo (MVS) can establish feature correspondences across views, leading to more accurate reconstruction results. In these methods, significant overlap between viewpoints and even a fixed order of input images are often required to obtain feature correspondences for reconstruction.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Zhang et al (2023 explicitly infered and integrated the pixel-wise occlusion information in the MVSNet via the matching uncertainty estimation. Attention module and Transformer are the hot spots in current study, thus a few algorithms based on attention and transformer were proposed Liao et al (2022); Weilharter and Fraundorfer, 2022;Li et al, 2022a;Wan et al, 2022;Wang et al, 2022;Jia et al, 2022;Ding et al, 2022. However, the large-scale reconstructed scene via the current attention based MVSNet is inaccurate and incomplete. To further improve the effect on multi-view stereo matching, in this study, we propose a novel attention-aware multi-view stereo network based on satellite imagery, namely, A-SATMVSNet.…”
Section: Introductionmentioning
confidence: 99%