2018
DOI: 10.1109/tip.2018.2836318
|View full text |Cite
|
Sign up to set email alerts
|

Deep Monocular Depth Estimation via Integration of Global and Local Predictions

Abstract: Recent works on machine learning have greatly advanced the accuracy of single image depth estimation. However, the resulting depth images are still over-smoothed and perceptually unsatisfying. This paper casts depth prediction from single image as a parametric learning problem. Specifically, we propose a deep variational model that effectively integrates heterogeneous predictions from two convolutional neural networks (CNNs), named global and local networks. They have contrasting network architecture and are d… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
65
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 93 publications
(66 citation statements)
references
References 39 publications
1
65
0
Order By: Relevance
“…Various datasets have been proposed that are suitable for monocular depth estimation, i.e. they consist of RGB images with corresponding depth annotation of some form [3], [11], [28], [29], [30], [31], [32], [33], [34], [35], [36], [37], [38], [40], [44], [45], [46], [47], [48]. Datasets differ in captured environments and objects (indoor/outdoor scenes, dynamic objects), type of depth annotation (sparse/dense, absolute/relative depth), accuracy (laser, time-of-flight, SfM, stereo, human annotation, synthetic data), image quality and camera settings, as well as dataset size.…”
Section: Existing Datasetsmentioning
confidence: 99%
See 1 more Smart Citation
“…Various datasets have been proposed that are suitable for monocular depth estimation, i.e. they consist of RGB images with corresponding depth annotation of some form [3], [11], [28], [29], [30], [31], [32], [33], [34], [35], [36], [37], [38], [40], [44], [45], [46], [47], [48]. Datasets differ in captured environments and objects (indoor/outdoor scenes, dynamic objects), type of depth annotation (sparse/dense, absolute/relative depth), accuracy (laser, time-of-flight, SfM, stereo, human annotation, synthetic data), image quality and camera settings, as well as dataset size.…”
Section: Existing Datasetsmentioning
confidence: 99%
“…We thus recreate the ground truth according to the procedure outlined by the original authors. DIML Indoor [31] (DL) is an RGB-D dataset of predominantly static indoor scenes, captured with a Kinect v2. Test datasets.…”
Section: Existing Datasetsmentioning
confidence: 99%
“…The work of [22] is a representative of beginning using deep learning to solve monocular depth estimation. The work of [24] integrates heterogeneous predictions from global and local networks. Andrea Pilzer et.al [34] train a student network to predict a disparity map and a backward cycle network for generating image to re-synthesize back the input image.…”
Section: B Benchmark Performance 1) Compared Methodsmentioning
confidence: 99%
“…Fu et al [13] apply dilated convolution for multi-scale receptive fields and develop a full-image encoder for global image properties. Kim et al [24] integrate heterogeneous predictions from global and local networks. These methods are difficult to maintain high resolution prediction.…”
Section: A Multi-scale Informationmentioning
confidence: 99%
“…However, high computational resources and the large amount of data is required for training of deep learning model from scratch. In the literature, some of the authors proposed the dataaugmentation technique to enlarge the dataset for the suitable training of deep learning model [13], [14]. This is not a good approach as it increases training computation and may not be suitable for the real-time scenario.…”
Section: Introductionmentioning
confidence: 99%