2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022
DOI: 10.1109/cvpr52688.2022.00840
|View full text |Cite
|
Sign up to set email alerts
|

RayMVSNet: Learning Ray-based 1D Implicit Fields for Accurate Multi-View Stereo

Abstract: Learning-based multi-view stereo (MVS) has by far centered around 3D convolution on cost volumes. Due to the high computation and memory consumption of 3D CNN, the resolution of output depth is often considerably limited. Different from most existing works dedicated to adaptive refinement of cost volumes, we opt to directly optimize the depth value along each camera ray, mimicking the range (depth) finding of a laser scanner. This reduces the MVS problem to ray-based depth optimization which is much more light… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 18 publications
(3 citation statements)
references
References 76 publications
(127 reference statements)
0
3
0
Order By: Relevance
“…Compared to IterMVS [24], our lightweight method is significantly better (from 56.94 to 57.91) in terms of generalization performance, and it benefits from our highly accurate iterative variable optimizer and efficient fusion strategy with its enlarged receptive field. In addition, the full version of our method even surpasses the latest nonefficient method, RayMVSNet [54], in terms of generalization performance while still maintaining fast inference speed and low memory consumption. We report a depth map comparison in a large and complex outdoor scene, as shown in Figure 7.…”
Section: Main Results On the Tanks And Temples Datasetmentioning
confidence: 96%
“…Compared to IterMVS [24], our lightweight method is significantly better (from 56.94 to 57.91) in terms of generalization performance, and it benefits from our highly accurate iterative variable optimizer and efficient fusion strategy with its enlarged receptive field. In addition, the full version of our method even surpasses the latest nonefficient method, RayMVSNet [54], in terms of generalization performance while still maintaining fast inference speed and low memory consumption. We report a depth map comparison in a large and complex outdoor scene, as shown in Figure 7.…”
Section: Main Results On the Tanks And Temples Datasetmentioning
confidence: 96%
“…More recently, transformer architectures and attention‐based mechanisms have been proposed for more efficient incorporation of the global context (Ding et al, 2022; Wang, Galliani, et al, 2022; Yu, Guo, et al, 2021; Zhang et al, 2021; Zhu et al, 2021). To avoid the computationally costly 3D convolutions, Xi et al (2022) proposed an attention‐based architecture to directly optimise the depth along the 1D viewing ray.…”
Section: Learning‐based Methodsmentioning
confidence: 99%
“…Using a transformer to process encoded features from multiple views with epipolar constraints has been proposed in prior work on Multi-view Stereo, see e.g. (He et al, 2020;Xi et al, 2022;Ding et al, 2022).…”
Section: Decoder and Scene Functionmentioning
confidence: 99%