Consistent video depth estimation

Luo, Xuan; Huang, Jia-Bin; Szeliski, Richard; Matzen, Kevin; Kopf, Johannes

doi:10.1145/3386569.3392377

Cited by 242 publications

(178 citation statements)

References 74 publications

(93 reference statements)

Supporting

Mentioning

176

Contrasting

Order By: Relevance

“…While indicating a high accuracy, a low precision is revealed by a standard deviation being comparable to the MAE. The results showed rapid fluctuation of the predicted depth due to independent per-frame processing as described in Luo et al [28]. This kind of geometric inconsistency in a temporal context can also be observed in Fig.…”

Section: Quantitative Results Of System Stabilitysupporting

confidence: 71%

“…Another way to describe stability is consistency in depth predicted. Prediction made by supervised monocular depth estimation often flickers due to independent per-frame processing [28]. In our system, although depth is only assigned to a sphere annotation when the curser is clicked at a pixel, depth consistency between frames is still relevant when depth is continuously read, and sphere annotations are consecutively made as the pressed cursor is dragged.…”

Section: Annotation Stability Evaluationmentioning

confidence: 99%

See 1 more Smart Citation

Real-to-virtual domain transfer-based depth estimation for real-time 3D annotation in transnasal surgery: a study of annotation accuracy and stability

Tong

Liu

et al. 2021

Int J CARS

View full text Add to dashboard Cite

Purpose Surgical annotation promotes effective communication between medical personnel during surgical procedures. However, existing approaches to 2D annotations are mostly static with respect to a display. In this work, we propose a method to achieve 3D annotations that anchor rigidly and stably to target structures upon camera movement in a transnasal endoscopic surgery setting. Methods This is accomplished through intra-operative endoscope tracking and monocular depth estimation. A virtual endoscopic environment is utilized to train a supervised depth estimation network. An adversarial network transfers the style from the real endoscopic view to a synthetic-like view for input into the depth estimation network, wherein framewise depth can be obtained in real time. Results (1) Accuracy: Framewise depth was predicted from images captured from within a nasal airway phantom and compared with ground truth, achieving a SSIM value of 0.8310 ± 0.0655. (2) Stability: mean absolute error (MAE) between reference and predicted depth of a target point was 1.1330 ± 0.9957 mm. Conclusion Both the accuracy and stability evaluations demonstrated the feasibility and practicality of our proposed method for achieving 3D annotations.

show abstract

Section: Quantitative Results Of System Stabilitysupporting

confidence: 71%

Section: Annotation Stability Evaluationmentioning

confidence: 99%

Real-to-virtual domain transfer-based depth estimation for real-time 3D annotation in transnasal surgery: a study of annotation accuracy and stability

Tong

Liu

et al. 2021

Int J CARS

View full text Add to dashboard Cite

show abstract

“…Since the model trained by the general self-supervised monocular depth estimation method predicts the relative depth for a single frame, flicker may occur when applied to consecutive images [22]. Patil et al [23] improves the depth accuracy based on spatiotemporal information by concatenating the encoding output of the previous frame with the encoding output of the current frame and decoding it.…”

Section: Depth Feedback Networkmentioning

confidence: 99%

“…In the study of measuring the coverage of colonoscopy based on a self-supervised learning [6], the view synthesis loss [20] and the prediction of the camera intrinsic matrix in the network [21] are applied. However, the depth obtained Sensors 2021, 21, 2691 2 of 16 by the monocular learning-based method often flickers depending on the scale ambiguity and prediction per single frame [22]. In recent research, recurrent depth estimation using temporal information [23] and multi-view reconstruction using spatial information [24] were proposed for using spatiotemporal information.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Unsupervised Monocular Depth Estimation for Colonoscope System Using Feedback Network

Hwang

Park

Kim

et al. 2021

Sensors

View full text Add to dashboard Cite

A colonoscopy is a medical examination used to check disease or abnormalities in the large intestine. If necessary, polyps or adenomas would be removed through the scope during a colonoscopy. Colorectal cancer can be prevented through this. However, the polyp detection rate differs depending on the condition and skill level of the endoscopist. Even some endoscopists have a 90% chance of missing an adenoma. Artificial intelligence and robot technologies for colonoscopy are being studied to compensate for these problems. In this study, we propose a self-supervised monocular depth estimation using spatiotemporal consistency in the colon environment. It is our contribution to propose a loss function for reconstruction errors between adjacent predicted depths and a depth feedback network that uses predicted depth information of the previous frame to predict the depth of the next frame. We performed quantitative and qualitative evaluation of our approach, and the proposed FBNet (depth FeedBack Network) outperformed state-of-the-art results for unsupervised depth estimation on the UCL datasets.

show abstract

NAS-DIP: Learning Deep Image Prior with Neural Architecture Search

Chen

Gao

Robb

et al. 2020

Lecture Notes in Computer Science

Self Cite

View full text Add to dashboard Cite

Super-ResDenoising Inpainting Dehazing Translation Fig. 1: Applications. We propose to learn deep image prior using a neural architecture search. The resulting network can be applied to solve various inverse image problems without training the model with a large-scale dataset with ground truth. Through extensive experimental evaluations, we show that our model compares favorably against existing hand-crafted CNN models for learning-free image restoration tasks and in some cases even reaches competitive performance when compared with recent learning-based models.

show abstract

Consistent video depth estimation

Cited by 242 publications

References 74 publications

Real-to-virtual domain transfer-based depth estimation for real-time 3D annotation in transnasal surgery: a study of annotation accuracy and stability

Real-to-virtual domain transfer-based depth estimation for real-time 3D annotation in transnasal surgery: a study of annotation accuracy and stability

Unsupervised Monocular Depth Estimation for Colonoscope System Using Feedback Network

NAS-DIP: Learning Deep Image Prior with Neural Architecture Search

Contact Info

Product

Resources

About