2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019
DOI: 10.1109/cvpr.2019.00787
|View full text |Cite
|
Sign up to set email alerts
|

Single Image Depth Estimation Trained via Depth From Defocus Cues

Abstract: Estimating depth from a single RGB images is a fundamental task in computer vision, which is most directly solved using supervised deep learning. In the field of unsupervised learning of depth from a single RGB image, depth is not given explicitly. Existing work in the field receives either a stereo pair, a monocular video, or multiple views, and, using losses that are based on structure-from-motion, trains a depth estimation network. In this work, we rely, instead of different views, on depth from focus cues.… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
85
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 106 publications
(92 citation statements)
references
References 45 publications
0
85
0
Order By: Relevance
“…Zhang et al [24] considered the training loss as combination of the point-to-point, shape and distribution similarity between predictions and ground truth, which leveraged hierarchical structure information to guide the network optimization. Alhashim et al [37] and Gur et al [38] regarded the training loss as the sum of L1 loss and structural similarity (SSIM) loss, which sought the balance between the point-to-point difference and the distortions of high frequency details in the image domain. Hu et al [25] and Chen et al [27] made a simple analysis of orthogonal sensitivities to different types of errors and then proposed to use a combination of three loss function terms, which included the point-to-point loss, the gradients loss, and the normal loss.…”
Section: Loss Functionmentioning
confidence: 99%
“…Zhang et al [24] considered the training loss as combination of the point-to-point, shape and distribution similarity between predictions and ground truth, which leveraged hierarchical structure information to guide the network optimization. Alhashim et al [37] and Gur et al [38] regarded the training loss as the sum of L1 loss and structural similarity (SSIM) loss, which sought the balance between the point-to-point difference and the distortions of high frequency details in the image domain. Hu et al [25] and Chen et al [27] made a simple analysis of orthogonal sensitivities to different types of errors and then proposed to use a combination of three loss function terms, which included the point-to-point loss, the gradients loss, and the normal loss.…”
Section: Loss Functionmentioning
confidence: 99%
“…We investigate classic models, including different layers of CNNs, such as ResNet-18, ResNet-34, ResNet-50, DenseNet-121, DPN-68, DPN-92, and DPN-131. In these CNNs, ResNet-50 is employed as an encoder by famous studies [6], [8], [32], [33]. Although few methods use DPNs to predict depth, we modify DPN-92 as the encoder of DEM after considering the tradeoff between multiply-andaccumulate operations and accuracy of DPNs.…”
Section: A Demmentioning
confidence: 99%
“…In dense depth prediction, our decoder is superior because the pixel-level depth maps are learned by transposed convolution and convolution layers. By contrast, the state of the arts [5], [8], [32] use decoders including linear interpolation.…”
Section: A Demmentioning
confidence: 99%
See 2 more Smart Citations