2020
DOI: 10.48550/arxiv.2005.08607
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Decoder Modulation for Indoor Depth Completion

Abstract: Accurate depth map estimation is an essential step in scene spatial mapping for AR applications and 3D modeling. Current depth sensors provide time-synchronized depth and color images in real-time, but have limited range and suffer from missing and erroneous depth values on transparent or glossy surfaces. We investigate the task of depth completion that aims at improving the accuracy of depth measurements and recovering the missing depth values using additional information from corresponding color images. Surp… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
9
0

Year Published

2021
2021
2021
2021

Publication Types

Select...
5

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(9 citation statements)
references
References 24 publications
0
9
0
Order By: Relevance
“…After refining the depth for mirrors, our Mirror3D dataset does not contain any depth value < 0.00001 and so all pixels are included in the evaluation. sensor-D * * 0.168 ± 0.037 0.996 ± 0.001 0.894 ± 0.013 0.227 ± 0.041 0.998 ± 0.001 0.901 ± 0.013 0.353 ± 0.048 0.999 ± 0.000 0.912 ± 0.012 0.527 ± 0.047 0.999 ± 0.000 0.931 ± 0.011 0.695 ± 0.043 1.000 ± 0.000 0.950 ± 0.010 sensor-D * Mirror3DNet 0.206 ± 0.025 0.969 ± 0.004 0.873 ± 0.008 0.290 ± 0.029 0.977 ± 0.004 0.888 ± 0.008 0.495 ± 0.031 0.987 ± 0.003 0.917 ± 0.007 0.745 ± 0.025 0.994 ± 0.002 0.957 ± 0.006 0.890 ± 0.017 0.998 ± 0.001 0.980 ± 0.004 RGBD ref saic [31] 0.176 ± 0.021 0.917 ± 0.010 0.825 ± 0.010 0.267 ± 0.025 0.969 ± 0.005 0.881 ± 0.007 0.474 ± 0.029 0.995 ± 0.001 0.927 ± 0.006 0.685 ± 0.027 0.998 ± 0.001 0.956 ± 0.005 0.828 ± 0.021 0.999 ± 0.000 0.975 ± 0.004 RGBD raw saic [31] 0.165 ± 0.021 0.876 ± 0.013 0.791 ± 0.014 0.235 ± 0.024 0.941 ± 0.009 0.853 ± 0.011 0.372 ± 0.028 0.986 ± 0.003 0.902 ± 0.008 0.548 ± 0.028 0.999 ± 0.000 0.931 ± 0.007 0.699 ± 0.025 1.000 ± 0.000 0.951 ± 0.006 RGBD raw saic [31] + Mirror3DNet 0.202 ± 0.025 0.840 ± 0.014 0.762 ± 0.015 0.296 ± 0.029 0.913 ± 0.010 0.834 ± 0.012 0.519 ± 0.031 0.967 ± 0.005 0.902 ± 0.008 0.757 ± 0.025 0.992 ± 0.002 0.955 ± 0.006 0.889 ± 0.017 0.998 ± 0.001 0.979 ± 0.004 RGB ref BTS [18] 0.168 ± 0.018 0.233 ± 0.011 0.228 ± 0.011 0.327 ± 0.026 0.425 ± 0.017 0.416 ± 0.016 0.651 ± 0.030 0.731 ± 0.017 0.718 ± 0.017 0.867 ± 0.020 0.923 ± 0.009 0.912 ± 0.010 0.949 ± 0.012 0.976 ± 0.006 0.970 ± 0.007 RGB ref VNL [40] 0.000 ± 0.000 0.001 ± 0.000 0.001 ± 0.000 0.000 ± 0.000 0.002 ± 0.001 0.002 ± 0.001 0.000 ± 0.000 0.007 ± 0.002 0.007 ± 0.002 0.022 ± 0.010 0.029 ± 0.006 0.029 ± 0.006 0.071 ± 0.019 0.095 ± 0.014 0.093 ± 0.014 RGB raw BTS [18] 0.161 ± 0.030 0.238 ± 0.021 0.225 ± 0.020 0.301 ± 0.044 0.450 ± 0.030 0.427 ± 0.028 0.541 ± 0.052 0.812 ± 0.024 0.764 ± 0.025 0.712 ± 0.045 0.951 ± 0.011 0.903 ± 0.015 0.811 ± 0.036 0.983 ± 0.007 0.948 ± 0.011 RGB raw VNL [40] 0.015 ± 0.011 0.035 ± 0.009 0.032 ± 0.008 0.032 ± 0.022 0.072 ± 0.019 0.065 ± 0.017 0.047 ± 0.026 0. sensor-D * * 0.000 ± 0.000 0.000 ± 0.000 0.000 ± 0.000 0.000 ± 0.000 0.000 ± 0.000 0.000 ± 0.000 0.000 ± 0.000 0.000 ± 0.000 0.000 ± 0.000 1.000 ± 0.000 1.000 ± 0.000 sensor-D * * 1.000 ± 0.000 1.000 ± 0.000 1.000 ± 0.000 1.000 ± 0.000 1.000 ± 0.000 1.000 ± 0.000 1.000 ± 0.000 1.000 ± 0.000 1.000 ± 0.000 1.000 ± 0.000 1.000 ± 0.000 1.000 ± 0.000 1.000 ± 0.000 1.000 ± 0.000 1.000 ± 0.000 sensor-D * Mirror3DNet 0.508 ± 0.031 0.972 ± 0.004 0.915 ± 0.007 0.555 ± 0.031 0.979 ± 0.004 0.924 ± 0.007 0.638 ± 0.028 0.988 ± 0.003 0.941 ± 0.006 0.799 ± 0.022 0.995 ± 0.002 0.970 ± 0.004 0.932 ± 0.012 0.998 ± 0.001 0.990 ± 0.002 RGBD ref saic [31] 0.628 ± 0.029 0.920 ± 0.010 0.874 ± 0.011 0.710 ± 0.027 0.970 ± 0.005 0.928 ± 0.007 0.807 ± 0.024 0.995 ± 0.001 0.963 ± 0.005 0.880 ± 0.019 0.998 ± 0.001 0.976 ± 0.004 0.944 ± 0.013 0.999 ± 0.000 0.987 ± 0.003 RGBD raw saic [31] 0.877 ± 0.016 0.878 ± 0.013 0.887 ± 0.011 0.932 ± 0.014 0.942 ± 0.009 0.947 ± 0.007 0.976 ± 0.009 0.987 ± 0.003 0.988 ± 0.003 0.997 ± 0.002 0.999 ± 0.000 0.999 ± 0.000 1.000 ± 0.000 1.000 ± 0.000 1.000 ± 0.000 RGBD raw saic [31] .000 ± 0.000 0.000 ± 0.000 0.000 ± 0.000 0.000 ± 0.000 0.000 ± 0.000 0.000 ± 0.000 0.000 ± 0.000 0.000 ± 0.000 0.000 ± 0.000 1.000 ± 0.000 1.000 ± 0.000 1.000 ± ...…”
Section: Additional Quantitative Resultsmentioning
confidence: 99%
See 2 more Smart Citations
“…After refining the depth for mirrors, our Mirror3D dataset does not contain any depth value < 0.00001 and so all pixels are included in the evaluation. sensor-D * * 0.168 ± 0.037 0.996 ± 0.001 0.894 ± 0.013 0.227 ± 0.041 0.998 ± 0.001 0.901 ± 0.013 0.353 ± 0.048 0.999 ± 0.000 0.912 ± 0.012 0.527 ± 0.047 0.999 ± 0.000 0.931 ± 0.011 0.695 ± 0.043 1.000 ± 0.000 0.950 ± 0.010 sensor-D * Mirror3DNet 0.206 ± 0.025 0.969 ± 0.004 0.873 ± 0.008 0.290 ± 0.029 0.977 ± 0.004 0.888 ± 0.008 0.495 ± 0.031 0.987 ± 0.003 0.917 ± 0.007 0.745 ± 0.025 0.994 ± 0.002 0.957 ± 0.006 0.890 ± 0.017 0.998 ± 0.001 0.980 ± 0.004 RGBD ref saic [31] 0.176 ± 0.021 0.917 ± 0.010 0.825 ± 0.010 0.267 ± 0.025 0.969 ± 0.005 0.881 ± 0.007 0.474 ± 0.029 0.995 ± 0.001 0.927 ± 0.006 0.685 ± 0.027 0.998 ± 0.001 0.956 ± 0.005 0.828 ± 0.021 0.999 ± 0.000 0.975 ± 0.004 RGBD raw saic [31] 0.165 ± 0.021 0.876 ± 0.013 0.791 ± 0.014 0.235 ± 0.024 0.941 ± 0.009 0.853 ± 0.011 0.372 ± 0.028 0.986 ± 0.003 0.902 ± 0.008 0.548 ± 0.028 0.999 ± 0.000 0.931 ± 0.007 0.699 ± 0.025 1.000 ± 0.000 0.951 ± 0.006 RGBD raw saic [31] + Mirror3DNet 0.202 ± 0.025 0.840 ± 0.014 0.762 ± 0.015 0.296 ± 0.029 0.913 ± 0.010 0.834 ± 0.012 0.519 ± 0.031 0.967 ± 0.005 0.902 ± 0.008 0.757 ± 0.025 0.992 ± 0.002 0.955 ± 0.006 0.889 ± 0.017 0.998 ± 0.001 0.979 ± 0.004 RGB ref BTS [18] 0.168 ± 0.018 0.233 ± 0.011 0.228 ± 0.011 0.327 ± 0.026 0.425 ± 0.017 0.416 ± 0.016 0.651 ± 0.030 0.731 ± 0.017 0.718 ± 0.017 0.867 ± 0.020 0.923 ± 0.009 0.912 ± 0.010 0.949 ± 0.012 0.976 ± 0.006 0.970 ± 0.007 RGB ref VNL [40] 0.000 ± 0.000 0.001 ± 0.000 0.001 ± 0.000 0.000 ± 0.000 0.002 ± 0.001 0.002 ± 0.001 0.000 ± 0.000 0.007 ± 0.002 0.007 ± 0.002 0.022 ± 0.010 0.029 ± 0.006 0.029 ± 0.006 0.071 ± 0.019 0.095 ± 0.014 0.093 ± 0.014 RGB raw BTS [18] 0.161 ± 0.030 0.238 ± 0.021 0.225 ± 0.020 0.301 ± 0.044 0.450 ± 0.030 0.427 ± 0.028 0.541 ± 0.052 0.812 ± 0.024 0.764 ± 0.025 0.712 ± 0.045 0.951 ± 0.011 0.903 ± 0.015 0.811 ± 0.036 0.983 ± 0.007 0.948 ± 0.011 RGB raw VNL [40] 0.015 ± 0.011 0.035 ± 0.009 0.032 ± 0.008 0.032 ± 0.022 0.072 ± 0.019 0.065 ± 0.017 0.047 ± 0.026 0. sensor-D * * 0.000 ± 0.000 0.000 ± 0.000 0.000 ± 0.000 0.000 ± 0.000 0.000 ± 0.000 0.000 ± 0.000 0.000 ± 0.000 0.000 ± 0.000 0.000 ± 0.000 1.000 ± 0.000 1.000 ± 0.000 sensor-D * * 1.000 ± 0.000 1.000 ± 0.000 1.000 ± 0.000 1.000 ± 0.000 1.000 ± 0.000 1.000 ± 0.000 1.000 ± 0.000 1.000 ± 0.000 1.000 ± 0.000 1.000 ± 0.000 1.000 ± 0.000 1.000 ± 0.000 1.000 ± 0.000 1.000 ± 0.000 1.000 ± 0.000 sensor-D * Mirror3DNet 0.508 ± 0.031 0.972 ± 0.004 0.915 ± 0.007 0.555 ± 0.031 0.979 ± 0.004 0.924 ± 0.007 0.638 ± 0.028 0.988 ± 0.003 0.941 ± 0.006 0.799 ± 0.022 0.995 ± 0.002 0.970 ± 0.004 0.932 ± 0.012 0.998 ± 0.001 0.990 ± 0.002 RGBD ref saic [31] 0.628 ± 0.029 0.920 ± 0.010 0.874 ± 0.011 0.710 ± 0.027 0.970 ± 0.005 0.928 ± 0.007 0.807 ± 0.024 0.995 ± 0.001 0.963 ± 0.005 0.880 ± 0.019 0.998 ± 0.001 0.976 ± 0.004 0.944 ± 0.013 0.999 ± 0.000 0.987 ± 0.003 RGBD raw saic [31] 0.877 ± 0.016 0.878 ± 0.013 0.887 ± 0.011 0.932 ± 0.014 0.942 ± 0.009 0.947 ± 0.007 0.976 ± 0.009 0.987 ± 0.003 0.988 ± 0.003 0.997 ± 0.002 0.999 ± 0.000 0.999 ± 0.000 1.000 ± 0.000 1.000 ± 0.000 1.000 ± 0.000 RGBD raw saic [31] .000 ± 0.000 0.000 ± 0.000 0.000 ± 0.000 0.000 ± 0.000 0.000 ± 0.000 0.000 ± 0.000 0.000 ± 0.000 0.000 ± 0.000 0.000 ± 0.000 1.000 ± 0.000 1.000 ± 0.000 1.000 ± ...…”
Section: Additional Quantitative Resultsmentioning
confidence: 99%
“…The term depth completion is used when the input is RGBD, where the D (depth) channel is noisy and may have missing values. Existing methods for single-view depth estimation [1,4,9,10,18,19,24,29,30,40] and depth completion [15,25,27,31,42] improve depth prediction for the entire image, relying on reconstructed 3D mesh data that is assumed to provide accurate depth. Chabra et al [5] show that an exclusion mask for noisy areas such as reflective surfaces can result in better reconstruction.…”
Section: D Plane Detection and Plane Reconstructionmentioning
confidence: 99%
See 1 more Smart Citation
“…The decoder of the Depth Completion module consists of spatially-adaptive denormalization (SPADE) blocks, first introduced in [46]. Our usage of SPADE in the encoder-decoder Depth Completion module is a variant of [47]. This module enables us to learn spatially-dependent scale and bias for decoder feature maps, which helps reduce the domain shift between RGB and depth, as introduced by the empty depths on the depth map.…”
Section: Depth Completionmentioning
confidence: 99%
“…Gridding Loss bypasses the un-orderedness of point clouds and is evaluated on the 3D grid. The depth completion network is trained using log L 1 pair-wise loss which forces the pairs of pixels in the predicted depth to regress to similar values as the corresponding pairs in the ground truth depth [47]. Let G describe the set of pixels where the ground truth depth is non-zero, i and j are the pixel pairs, and y and y * denote the ground truth and predicted depths, respectively.…”
Section: Loss Functionmentioning
confidence: 99%