Multi-Camera Collaborative Depth Prediction via Consistent Structure Estimation

Xu, Jialei; Liu, Xianming; Bai, Yuanchao; Jiang, Junjun; Wang, Kaixuan; Chen, Xiaozhi; Ji, Xiangyang

doi:10.1145/3503161.3548394

Cited by 6 publications

(17 citation statements)

References 69 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…1) Manhattan Normal Module: Depth prediction in coplanar regions can benefit from known surface normals [68], [69]. However, estimating surface normals in indoor scenes is challenging due to pervasive large untextured planes with consistent luminosity in rooms.…”

Section: A the Manhattan-constraint Network (Mcn) Branchmentioning

confidence: 99%

RGB-Depth Fusion GAN for Indoor Depth Completion

Wang

Che

et al. 2022

2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

View full text Add to dashboard Cite

The raw depth image captured by indoor depth sensors usually has an extensive range of missing depth values due to inherent limitations such as the inability to perceive transparent objects and the limited distance range. The incomplete depth map with missing values burdens many downstream vision tasks, and a rising number of depth completion methods have been proposed to alleviate this issue. While most existing methods can generate accurate dense depth maps from sparse and uniformly sampled depth maps, they are not suitable for complementing large contiguous regions of missing depth values, which is common and critical in images captured in indoor environments. To overcome these challenges, we design a novel two-branch end-to-end fusion network named RDFC-GAN, which takes a pair of RGB and incomplete depth images as input to predict a dense and completed depth map. The first branch employs an encoder-decoder structure, by adhering to the Manhattan world assumption and utilizing normal maps from RGB-D information as guidance, to regress the local dense depth values from the raw depth map. In the other branch, we propose an RGB-depth fusion CycleGAN to transfer the RGB image to the fine-grained textured depth map. We adopt adaptive fusion modules named W-AdaIN to propagate the features across the two branches, and we append a confidence fusion head to fuse the two outputs of the branches for the final depth map. Extensive experiments on NYU-Depth V2 and SUN RGB-D demonstrate that our proposed method clearly improves the depth completion performance, especially in a more realistic setting of indoor environments, with the help of our proposed pseudo depth maps in training.

show abstract

Section: A the Manhattan-constraint Network (Mcn) Branchmentioning

confidence: 99%

RGB-Depth Fusion GAN for Indoor Depth Completion

Wang

Che

et al. 2022

2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

View full text Add to dashboard Cite

show abstract

“…In contrast, depth models are supposed to predict dense results with both accurate details and integral spatial structures. Due to insufficient structural information, supervised [35,36,53] or self-supervised [14,51,55] methods produce failure predictions with concave objects, erroneous outcomes, or noticeable artifacts on autonomous driving scenarios [6,16].…”

Section: Introductionmentioning

confidence: 99%

“…To enhance spatial structures, recent works [14,17,19,51,55] explore self-supervised manner on driving scenes [6,16]. Surround-Depth [51] employs pseudo labels from Structure-from-Motion [40] to pretrain their model.…”

Section: Introductionmentioning

confidence: 99%

“…They utilize pose estimation and photometric loss [12] between six cameras to restore depth structures. MCDP [55] conducts multi-camera prediction by projections between different views. However, self-supervised methods rely on pose estimation [14], which is inaccurate in natural scenes and limits the robustness of those methods.…”

Section: Introductionmentioning

confidence: 99%

“…To overcome these challenges, we propose a novel supervised framework with sparse annotations termed Diffusion-Augmented Depth Prediction (DADP). Our method does not rely on pose estimation and multiple cameras, achieving better robustness than self-supervised methods [14,17,19,51,55], especially for challenging night or rainy scenes. DADP consists of a noise predictor and a depth predictor.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Celebrating the 70th Anniversary of School of Mechanical Science and Engineering of Huazhong University of Science & Technology

Wen

2022

IET Collab Intel Manufact

View full text Add to dashboard Cite

Depth estimation aims to predict dense depth maps. In autonomous driving scenes, sparsity of annotations makes the task challenging. Supervised models produce concave objects due to insufficient structural information. They overfit to valid pixels and fail to restore spatial structures. Self-supervised methods are proposed for the problem. Their robustness is limited by pose estimation, leading to erroneous results in natural scenes. In this paper, we propose a supervised framework termed Diffusion-Augmented Depth Prediction (DADP). We leverage the structural characteristics of diffusion model to enforce depth structures of depth models in a plug-and-play manner. An object-guided integrality loss is also proposed to further enhance regional structure integrality by fetching objective information. We evaluate DADP on three driving benchmarks and achieve significant improvements in depth structures and robustness. Our work provides a new perspective on depth estimation with sparse annotations in autonomous driving scenes. CCS CONCEPTS• Computing methodologies → Scene understanding.

show abstract