Real-Time Monocular Depth Estimation Using Synthetic Data with Domain Adaptation via Image Style Transfer

Atapour-Abarghouei, Amir; Breckon, Toby P.

doi:10.1109/cvpr.2018.00296

Cited by 251 publications

(181 citation statements)

References 62 publications

(142 reference statements)

Supporting

Mentioning

180

Contrasting

Order By: Relevance

“…However, supervised learning-based approaches rely on expensive ground-truth depth data for training and are not flexible to be deployed in novel environments. Even if synthetic data generation has been proposed to partially tackle this issue [26], the cost of synthesizing realistic data remains high.…”

Section: Supervised Depth Estimationmentioning

confidence: 99%

Progressive Fusion for Unsupervised Binocular Depth Estimation Using Cycled Networks

Pilzer

Lathuilière

et al. 2020

IEEE Trans. Pattern Anal. Mach. Intell.

View full text Add to dashboard Cite

Recent deep monocular depth estimation approaches based on supervised regression have achieved remarkable performance. However, they require costly ground truth annotations during training. To cope with this issue, in this paper we present a novel unsupervised deep learning approach for predicting depth maps. We introduce a new network architecture, named Progressive Fusion Network (PFN), that is specifically designed for binocular stereo depth estimation. This network is based on a multi-scale refinement strategy that combines the information provided by both stereo views. In addition, we propose to stack twice this network in order to form a cycle. This cycle approach can be interpreted as a form of data-augmentation since, at training time, the network learns both from the training set images (in the forward half-cycle) but also from the synthesized images (in the backward half-cycle). The architecture is jointly trained with adversarial learning. Extensive experiments on the publicly available datasets KITTI, Cityscapes and ApolloScape demonstrate the effectiveness of the proposed model which is competitive with other unsupervised deep learning methods for depth prediction.

show abstract

Section: Supervised Depth Estimationmentioning

confidence: 99%

Progressive Fusion for Unsupervised Binocular Depth Estimation Using Cycled Networks

Pilzer

Lathuilière

et al. 2020

IEEE Trans. Pattern Anal. Mach. Intell.

View full text Add to dashboard Cite

show abstract

“…The primary reason for using synthetic images [17] during training is that despite the increased depth density of the real-world imagery [54], depth information for the majority of the scene is still missing, leading to undesirable artefacts in regions where depth values are not available. A naïve solution would be to only use synthetic data to resolve the issue, but due to differences in the data domains, a model only trained on synthetic data cannot be expected to perform well on real-world images without domain adaptation [5,63]. Consequently, we opt for randomly sampling training images from both datasets to force the overall model to capture the underlying distribution of both data domains, and therefore, learn the full dense structure of a synthetic scene while simultaneously modelling the contextual complexity of the naturally-sensed real-world images.…”

Section: Proposed Approachmentioning

confidence: 99%

“…• A joint multi-task framework for depth prediction encouraging improved geometric and contextual learning to boost performance. • Monocular depth estimation via adversarial training, a deep architecture with skip connections and a robust compound objective function directly supervised using this framework to outperform prior contemporary work [5,7,14,20,31,36,62,66]. • Sparse to dense depth completion via the same multitask model, capable of generating a dense depth output given a sparse depth input captured via a LiDAR sensor with results superior to prior contemporary work [10,16,40,50,54].…”

Section: Introductionmentioning

confidence: 98%

See 1 more Smart Citation

To Complete or to Estimate, That is the Question: A Multi-Task Approach to Depth Completion and Monocular Depth Estimation

Atapour-Abarghouei

Breckon

2019

2019 International Conference on 3D Vision (3DV)

Self Cite

View full text Add to dashboard Cite

Robust three-dimensional scene understanding is now an ever-growing area of research highly relevant in many realworld applications such as autonomous driving and robotic navigation. In this paper, we propose a multi-task learningbased model capable of performing two tasks:-sparse depth completion (i.e. generating complete dense scene depth given a sparse depth image as the input) and monocular depth estimation (i.e. predicting scene depth from a single RGB image) via two sub-networks jointly trained end to end using data randomly sampled from a publicly available corpus of synthetic and real-world images. The first subnetwork generates a sparse depth image by learning lower level features from the scene and the second predicts a full dense depth image of the entire scene, leading to a better geometric and contextual understanding of the scene and, as a result, superior performance of the approach. The entire model can be used to infer complete scene depth from a single RGB image or the second network can be used alone to perform depth completion given a sparse depth input. Using adversarial training, a robust objective function, a deep architecture relying on skip connections and a blend of synthetic and real-world training data, our approach is capable of producing superior high quality scene depth. Extensive experimental evaluation demonstrates the efficacy of our approach compared to contemporary state-of-the-art techniques across both problem domains.

show abstract

“…Markov random field (MRF) [38] and Conditional random field (CRF) [31] can be applied to regress image depth against monocular images. More recent approaches use deep neural networks with multi-scale predictions [11,12], large-scale datasets [26,2] and user interactions [37]. Stereo provides strong cues for unsupervised learning [14,46] or semi-supervised learning with LiDAR [24].…”

Section: Related Workmentioning

confidence: 99%

Counterfactual Depth from a Single RGB Image

Issaranon

Zou

Forsyth

2019

2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW)

View full text Add to dashboard Cite

We describe a method that predicts, from a single RGB image, a depth map that describes the scene when a masked object is removed -we call this "counterfactual depth" that models hidden scene geometry together with the observations. Our method works for the same reason that scene completion works: the spatial structure of objects is simple. But we offer a much higher resolution representation of space than current scene completion methods, as we operate at pixel-level precision and do not rely on a voxel representation. Furthermore, we do not require RGBD inputs.Our method uses a standard encoder-decoder architecture, and with a decoder modified to accept an object mask. We describe a small evaluation dataset that we have collected, which allows inference about what factors affect reconstruction most strongly. Using this dataset, we show that our depth predictions for masked objects are better than other baselines.

show abstract

Real-Time Monocular Depth Estimation Using Synthetic Data with Domain Adaptation via Image Style Transfer

Cited by 251 publications

References 62 publications

Progressive Fusion for Unsupervised Binocular Depth Estimation Using Cycled Networks

Progressive Fusion for Unsupervised Binocular Depth Estimation Using Cycled Networks

To Complete or to Estimate, That is the Question: A Multi-Task Approach to Depth Completion and Monocular Depth Estimation

Counterfactual Depth from a Single RGB Image

Contact Info

Product

Resources

About