2020
DOI: 10.1109/lra.2020.3017478
|View full text |Cite
|
Sign up to set email alerts
|

Don’t Forget The Past: Recurrent Depth Estimation from Monocular Video

Abstract: Autonomous cars need continuously updated depth information. Thus far, depth is mostly estimated independently for a single frame at a time, even if the method starts from video input. Our method produces a time series of depth maps, which makes it an ideal candidate for online learning approaches. In particular, we put three different types of depth estimation (supervised depth prediction, self-supervised depth prediction, and self-supervised depth completion) into a common framework. We integrate the corresp… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
56
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 95 publications
(57 citation statements)
references
References 42 publications
(109 reference statements)
1
56
0
Order By: Relevance
“…where α = 0.85 is a balancing weight and SSIM is a method of comparing and evaluating the quality of the predicted image with the original image. It is an index frequently used for depth estimation [17,21,23,33,37]. The SSIM between two images I x and I y is defined by:…”
Section: Image Reconstruction Lossmentioning
confidence: 99%
See 2 more Smart Citations
“…where α = 0.85 is a balancing weight and SSIM is a method of comparing and evaluating the quality of the predicted image with the original image. It is an index frequently used for depth estimation [17,21,23,33,37]. The SSIM between two images I x and I y is defined by:…”
Section: Image Reconstruction Lossmentioning
confidence: 99%
“…Since the model trained by the general self-supervised monocular depth estimation method predicts the relative depth for a single frame, flicker may occur when applied to consecutive images [22]. Patil et al [23] improves the depth accuracy based on spatiotemporal information by concatenating the encoding output of the previous frame with the encoding output of the current frame and decoding it. In a recent study [22], performance was improved by proposing optical flow-based loss including geometry consistency, but real-time execution is impossible because of an additional operation that requires learning at test time.…”
Section: Depth Feedback Networkmentioning
confidence: 99%
See 1 more Smart Citation
“…There is a large body of prior work on depth estimation across multiple frames [17,31,8,30,18,28,19]. Liu et al [17] aggregate per-frame depth estimates across frames using Bayesian filtering.…”
Section: B Multi-frame Depth Estimationmentioning
confidence: 99%
“…Matthies et al [18] use a similar Bayesian approach, but their method is only applied to controlled scenes and restricted camera motion. Other works [31,8,28,19] use RNNs for predicting depth maps at each frame. All of aforementioned works try to predict the full 2D depth map of the environment from monocular images.…”
Section: B Multi-frame Depth Estimationmentioning
confidence: 99%