2020
DOI: 10.1145/3386569.3392377
|View full text |Cite
|
Sign up to set email alerts
|

Consistent video depth estimation

Abstract: We present an algorithm for reconstructing dense, geometrically consistent depth for all pixels in a monocular video. We leverage a conventional structure-from-motion reconstruction to establish geometric constraints on pixels in the video. Unlike the ad-hoc priors in classical reconstruction, we use a learning-based prior, i.e., a convolutional neural network trained for single-image depth estimation. At test time, we fine-tune this network to satisfy the geometric constraints of a particular input video, whi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
176
1

Year Published

2020
2020
2022
2022

Publication Types

Select...
5
2

Relationship

1
6

Authors

Journals

citations
Cited by 242 publications
(178 citation statements)
references
References 74 publications
(93 reference statements)
1
176
1
Order By: Relevance
“…While indicating a high accuracy, a low precision is revealed by a standard deviation being comparable to the MAE. The results showed rapid fluctuation of the predicted depth due to independent per-frame processing as described in Luo et al [28]. This kind of geometric inconsistency in a temporal context can also be observed in Fig.…”
Section: Quantitative Results Of System Stabilitysupporting
confidence: 71%
See 1 more Smart Citation
“…While indicating a high accuracy, a low precision is revealed by a standard deviation being comparable to the MAE. The results showed rapid fluctuation of the predicted depth due to independent per-frame processing as described in Luo et al [28]. This kind of geometric inconsistency in a temporal context can also be observed in Fig.…”
Section: Quantitative Results Of System Stabilitysupporting
confidence: 71%
“…Another way to describe stability is consistency in depth predicted. Prediction made by supervised monocular depth estimation often flickers due to independent per-frame processing [28]. In our system, although depth is only assigned to a sphere annotation when the curser is clicked at a pixel, depth consistency between frames is still relevant when depth is continuously read, and sphere annotations are consecutively made as the pressed cursor is dragged.…”
Section: Annotation Stability Evaluationmentioning
confidence: 99%
“…Since the model trained by the general self-supervised monocular depth estimation method predicts the relative depth for a single frame, flicker may occur when applied to consecutive images [22]. Patil et al [23] improves the depth accuracy based on spatiotemporal information by concatenating the encoding output of the previous frame with the encoding output of the current frame and decoding it.…”
Section: Depth Feedback Networkmentioning
confidence: 99%
“…In the study of measuring the coverage of colonoscopy based on a self-supervised learning [6], the view synthesis loss [20] and the prediction of the camera intrinsic matrix in the network [21] are applied. However, the depth obtained Sensors 2021, 21, 2691 2 of 16 by the monocular learning-based method often flickers depending on the scale ambiguity and prediction per single frame [22]. In recent research, recurrent depth estimation using temporal information [23] and multi-view reconstruction using spatial information [24] were proposed for using spatiotemporal information.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation