Monocular Depth Estimation Using Laplacian Pyramid-Based Depth Residuals

Song, Minsoo; Lim, Seokjae; Kim, Wonjun

doi:10.1109/tcsvt.2021.3049869

Cited by 162 publications

(62 citation statements)

References 36 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Lee et al [37] demonstrated that by using DenseNet161 [18] as the encoder backbone for the NYUv2 [4] dataset, their method's accuracy was higher than when using ResNet101 [17]. Song et al [41] further demonstrated in their ablation studies that for the KITTI [5] dataset, the ResNeXt [36] encoder provides the best performance for their model, which matched the findings by Lee et al [37] for this dataset. Ablation studies from Bhat et al [21] illustrate that the use of the EfficientNet-B5 [1] can produce very good predictive performance with a basic decoder.…”

Section: ) Encoders For Monocular Depth Estimationsupporting

confidence: 55%

“…On top of their dilated ResNet-101 backbone in Stage 1, they use the ASPP module [20] to gather global contextual information in Stage 2. ASPP modules in different forms have since been adopted by Yin et al [35], Lee et al [37] and Song et al [41].…”

Section: ) Dilated Convolutions In Depth Estimationmentioning

confidence: 99%

“…Merging low resolution features with high resolutions features in decoders for monocular depth estimation results in a transfer of strong global contextual information from the lower resolutions to higher resolution reconstructions. Lee et al [37] and Song et al [41] both employ a variation of this method to improve their model's predictive performance. We propose a simpler component to reduce the computational overhead for a more efficient and accurate decoder structure.…”

Section: ) Up-sampling Pyramidmentioning

confidence: 99%

See 2 more Smart Citations

D-Net: A Generalised and Optimised Deep Network for Monocular Depth Estimation

2021

View full text Add to dashboard Cite

Depth estimation is an essential component in computer vision systems for achieving 3D scene understanding. Efficient and accurate depth map estimation has numerous applications including self-driving vehicles and virtual reality. This paper presents a new deep network, called D-Net, for depth estimation from a single RGB image. The proposed network is designed as an efficient, accurate and universal model that can adopt a wide range of encoder backbones. Our approach gathers strong global and local contextual features at multiple resolutions and transfers these to high resolutions for clearer depth maps. For the encoder backbone we adopt state-of-the-art models including EfficientNet [1], HRNet[2] and Swin Transformer [3] to obtain densely labelled depth maps. The proposed D-net can be trained end-to-end and is designed to have minimal parameters and a reduced computational complexity. Extensive evaluations on the NYUv2 [4] and KITTI [5] benchmark datasets show that our model is highly accurate across multiple backbones and achieves state-of-the-art performance on both benchmark datasets when combined with the Swin Transformer and HRNets.

show abstract

Section: ) Encoders For Monocular Depth Estimationsupporting

confidence: 55%

Section: ) Dilated Convolutions In Depth Estimationmentioning

confidence: 99%

Section: ) Up-sampling Pyramidmentioning

confidence: 99%

See 1 more Smart Citation

D-Net: A Generalised and Optimised Deep Network for Monocular Depth Estimation

2021

View full text Add to dashboard Cite

show abstract

“…They employed a reinforcement learning algorithm and automatically prune redundant channels of MDE by finding a relatively optimal pruning policy. Song et al [28] et al proposed a simple but effective scheme by incorporating the Laplacian pyramid into the decoder architecture. Specifically, encoded features were fed into different streams for decoding depth residuals.…”

Section: Related Workmentioning

confidence: 99%

Detail-preserving depth estimation from a single image based on modified fully convolutional residual network and gradient network

Liu

2021

SN Appl. Sci.

View full text Add to dashboard Cite

Predicting a convincing depth map from a monocular single image is a daunting task in the field of computer vision. In this paper, we propose a novel detail-preserving depth estimation (DPDE) algorithm based on a modified fully convolutional residual network and gradient network. Specifically, we first introduce a new deep network that combines the fully convolutional residual network (FCRN) and a U-shaped architecture to generate the global depth map. Meanwhile, an efficient feature similarity-based loss term is introduced for training this network better. Then, we devise a gradient network to generate the local details of the scene based on gradient information. Finally, an optimization-based fusion scheme is proposed to integrate the depth and depth gradients to generate a reliable depth map with better details. Three benchmark RGBD datasets are evaluated from the perspective of qualitative and quantitative, the experimental results show that the designed depth prediction algorithm is superior to several classic depth prediction approaches and can reconstruct plausible depth maps.

show abstract

“…The first works in this area [14], [15] used ground truth depth to learn supervisedly. Later research contributed mainly by proposing architectural innovations [16]- [19]. All these methods rely on accurate ground truth labels at training, which is not trivial in many application domains.…”

Section: B Single-view Depth Learningmentioning

confidence: 99%

On the Uncertain Single-View Depths in Endoscopies

Rodríguez-Puigvert¹,

Recasens²,

Civera³

et al. 2021

Preprint

View full text Add to dashboard Cite

Estimating depth from endoscopic images is a pre-requisite for a wide set of AI-assisted technologies, namely accurate localization, measurement of tumors, or identification of non-inspected areas. As the domain specificity of colonoscopies -a deformable low-texture environment with fluids, poor lighting conditions and abrupt sensor motions-pose challenges to multi-view approaches, single-view depth learning stands out as a promising line of research. In this paper, we explore for the first time Bayesian deep networks for single-view depth estimation in colonoscopies. Their uncertainty quantification offers great potential for such a critical application area. Our specific contribution is two-fold: 1) an exhaustive analysis of Bayesian deep networks for depth estimation in three different datasets, highlighting challenges and conclusions regarding synthetic-to-real domain changes and supervised vs. self-supervised methods; and 2) a novel teacherstudent approach to deep depth learning that takes into account the teacher uncertainty.

show abstract

Monocular Depth Estimation Using Laplacian Pyramid-Based Depth Residuals

Cited by 162 publications

References 36 publications

D-Net: A Generalised and Optimised Deep Network for Monocular Depth Estimation

D-Net: A Generalised and Optimised Deep Network for Monocular Depth Estimation

Detail-preserving depth estimation from a single image based on modified fully convolutional residual network and gradient network

On the Uncertain Single-View Depths in Endoscopies

Contact Info

Product

Resources

About