2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021
DOI: 10.1109/cvpr46437.2021.01507
|View full text |Cite
|
Sign up to set email alerts
|

DeepVideoMVS: Multi-View Stereo on Video with Recurrent Spatio-Temporal Fusion

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
51
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 54 publications
(51 citation statements)
references
References 32 publications
0
51
0
Order By: Relevance
“…Video-based depth estimation has attracted extensive attentions recently. They are mainly categorized into multi-view stereo based approaches [11,24,36] and hybrid methods [19,25]. The former try to improve the traditional structure-from-motion and multi-view stereo pipeline with some learning-based modules, such as a differentiable depth and pose modules or a depth estimation uncertainty predictor.…”
Section: Related Workmentioning
confidence: 99%
“…Video-based depth estimation has attracted extensive attentions recently. They are mainly categorized into multi-view stereo based approaches [11,24,36] and hybrid methods [19,25]. The former try to improve the traditional structure-from-motion and multi-view stereo pipeline with some learning-based modules, such as a differentiable depth and pose modules or a depth estimation uncertainty predictor.…”
Section: Related Workmentioning
confidence: 99%
“…For example, CNN-SLAM [23] predicts depth images for keyframes, then refines them using smallbaseline multi-view stereo from surrounding non-keyframe images. DeepVideoMVS [25] extends a cost volume-based encoder-decoder with a ConvLSTM cell at the bottleneck layer to leverage past scene geometry to improve depth prediction at the current time step. Unlike standard ConvL-STM methods, it can make geometrically correct predictions because of its underlying use of MVS at each time step.…”
Section: A Real-time Dense Monocular 3d Reconstructionmentioning
confidence: 99%
“…Inspired by stereo matching networks [26,3], MVS studies [43,4,16,22,20,39] have developed cost volume for unstructured multi-view matching. Relying on basic frameworks, such as DPSNet [22] or MVSNet [43], follow-up research proposes point-based depth refinement [4], cascaded depth refinement [16], and temporal fusion network [20,10]. After exhaustively estimating a collection of depth maps, depth fusion [14,8] starts to reconstruct the global 3D scene.…”
Section: Multi-view Stereomentioning
confidence: 99%
“…Since this strategy proved to be effective, it became the most commonly used technique to build a cost volume. As a result, it has also been widely applied in un-rectified multi-view stereo pipelines [20,22,10]. Nonetheless, it appears that this representation is not appropriate for multiview stereo.…”
Section: Posed Convolution Layermentioning
confidence: 99%
See 1 more Smart Citation