Decoder Modulation for Indoor Depth Completion

Senushkin, Dmitry; Романов, М.С.; Belikov, Ilia; Konushin, Anton; Patakin, Nikolay

doi:10.48550/arxiv.2005.08607

Cited by 5 publications

(9 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…After refining the depth for mirrors, our Mirror3D dataset does not contain any depth value < 0.00001 and so all pixels are included in the evaluation. sensor-D * * 0.168 ± 0.037 0.996 ± 0.001 0.894 ± 0.013 0.227 ± 0.041 0.998 ± 0.001 0.901 ± 0.013 0.353 ± 0.048 0.999 ± 0.000 0.912 ± 0.012 0.527 ± 0.047 0.999 ± 0.000 0.931 ± 0.011 0.695 ± 0.043 1.000 ± 0.000 0.950 ± 0.010 sensor-D * Mirror3DNet 0.206 ± 0.025 0.969 ± 0.004 0.873 ± 0.008 0.290 ± 0.029 0.977 ± 0.004 0.888 ± 0.008 0.495 ± 0.031 0.987 ± 0.003 0.917 ± 0.007 0.745 ± 0.025 0.994 ± 0.002 0.957 ± 0.006 0.890 ± 0.017 0.998 ± 0.001 0.980 ± 0.004 RGBD ref saic [31] 0.176 ± 0.021 0.917 ± 0.010 0.825 ± 0.010 0.267 ± 0.025 0.969 ± 0.005 0.881 ± 0.007 0.474 ± 0.029 0.995 ± 0.001 0.927 ± 0.006 0.685 ± 0.027 0.998 ± 0.001 0.956 ± 0.005 0.828 ± 0.021 0.999 ± 0.000 0.975 ± 0.004 RGBD raw saic [31] 0.165 ± 0.021 0.876 ± 0.013 0.791 ± 0.014 0.235 ± 0.024 0.941 ± 0.009 0.853 ± 0.011 0.372 ± 0.028 0.986 ± 0.003 0.902 ± 0.008 0.548 ± 0.028 0.999 ± 0.000 0.931 ± 0.007 0.699 ± 0.025 1.000 ± 0.000 0.951 ± 0.006 RGBD raw saic [31] + Mirror3DNet 0.202 ± 0.025 0.840 ± 0.014 0.762 ± 0.015 0.296 ± 0.029 0.913 ± 0.010 0.834 ± 0.012 0.519 ± 0.031 0.967 ± 0.005 0.902 ± 0.008 0.757 ± 0.025 0.992 ± 0.002 0.955 ± 0.006 0.889 ± 0.017 0.998 ± 0.001 0.979 ± 0.004 RGB ref BTS [18] 0.168 ± 0.018 0.233 ± 0.011 0.228 ± 0.011 0.327 ± 0.026 0.425 ± 0.017 0.416 ± 0.016 0.651 ± 0.030 0.731 ± 0.017 0.718 ± 0.017 0.867 ± 0.020 0.923 ± 0.009 0.912 ± 0.010 0.949 ± 0.012 0.976 ± 0.006 0.970 ± 0.007 RGB ref VNL [40] 0.000 ± 0.000 0.001 ± 0.000 0.001 ± 0.000 0.000 ± 0.000 0.002 ± 0.001 0.002 ± 0.001 0.000 ± 0.000 0.007 ± 0.002 0.007 ± 0.002 0.022 ± 0.010 0.029 ± 0.006 0.029 ± 0.006 0.071 ± 0.019 0.095 ± 0.014 0.093 ± 0.014 RGB raw BTS [18] 0.161 ± 0.030 0.238 ± 0.021 0.225 ± 0.020 0.301 ± 0.044 0.450 ± 0.030 0.427 ± 0.028 0.541 ± 0.052 0.812 ± 0.024 0.764 ± 0.025 0.712 ± 0.045 0.951 ± 0.011 0.903 ± 0.015 0.811 ± 0.036 0.983 ± 0.007 0.948 ± 0.011 RGB raw VNL [40] 0.015 ± 0.011 0.035 ± 0.009 0.032 ± 0.008 0.032 ± 0.022 0.072 ± 0.019 0.065 ± 0.017 0.047 ± 0.026 0. sensor-D * * 0.000 ± 0.000 0.000 ± 0.000 0.000 ± 0.000 0.000 ± 0.000 0.000 ± 0.000 0.000 ± 0.000 0.000 ± 0.000 0.000 ± 0.000 0.000 ± 0.000 1.000 ± 0.000 1.000 ± 0.000 sensor-D * * 1.000 ± 0.000 1.000 ± 0.000 1.000 ± 0.000 1.000 ± 0.000 1.000 ± 0.000 1.000 ± 0.000 1.000 ± 0.000 1.000 ± 0.000 1.000 ± 0.000 1.000 ± 0.000 1.000 ± 0.000 1.000 ± 0.000 1.000 ± 0.000 1.000 ± 0.000 1.000 ± 0.000 sensor-D * Mirror3DNet 0.508 ± 0.031 0.972 ± 0.004 0.915 ± 0.007 0.555 ± 0.031 0.979 ± 0.004 0.924 ± 0.007 0.638 ± 0.028 0.988 ± 0.003 0.941 ± 0.006 0.799 ± 0.022 0.995 ± 0.002 0.970 ± 0.004 0.932 ± 0.012 0.998 ± 0.001 0.990 ± 0.002 RGBD ref saic [31] 0.628 ± 0.029 0.920 ± 0.010 0.874 ± 0.011 0.710 ± 0.027 0.970 ± 0.005 0.928 ± 0.007 0.807 ± 0.024 0.995 ± 0.001 0.963 ± 0.005 0.880 ± 0.019 0.998 ± 0.001 0.976 ± 0.004 0.944 ± 0.013 0.999 ± 0.000 0.987 ± 0.003 RGBD raw saic [31] 0.877 ± 0.016 0.878 ± 0.013 0.887 ± 0.011 0.932 ± 0.014 0.942 ± 0.009 0.947 ± 0.007 0.976 ± 0.009 0.987 ± 0.003 0.988 ± 0.003 0.997 ± 0.002 0.999 ± 0.000 0.999 ± 0.000 1.000 ± 0.000 1.000 ± 0.000 1.000 ± 0.000 RGBD raw saic [31] .000 ± 0.000 0.000 ± 0.000 0.000 ± 0.000 0.000 ± 0.000 0.000 ± 0.000 0.000 ± 0.000 0.000 ± 0.000 0.000 ± 0.000 0.000 ± 0.000 1.000 ± 0.000 1.000 ± 0.000 1.000 ± ...…”

Section: Additional Quantitative Resultsmentioning

confidence: 99%

“…The term depth completion is used when the input is RGBD, where the D (depth) channel is noisy and may have missing values. Existing methods for single-view depth estimation [1,4,9,10,18,19,24,29,30,40] and depth completion [15,25,27,31,42] improve depth prediction for the entire image, relying on reconstructed 3D mesh data that is assumed to provide accurate depth. Chabra et al [5] show that an exclusion mask for noisy areas such as reflective surfaces can result in better reconstruction.…”

Section: D Plane Detection and Plane Reconstructionmentioning

confidence: 99%

“…We used three different datasets as the ground truth for the purposes of the evaluation reported here: NYUv2- Figure 4: Visualizations of depth frames, depth errors against ground truth (RMSE mapped to colormap), and resulting 3D point clouds (PC) for NYUv2 (top) and Matterport3D (bottom). We compare the output depth from state-of-the-art RGB-based depth estimation approaches (bts [18], vnl [40]) and an RGB-based depth completion approach (saic [31]). We contrast the outputs from these approaches when trained directly on the corrected datasets which leverage our 3D mirror plane annotations (NYUv2-ref and MP3D-mesh-ref), against output of the approaches when trained on the uncorrected original datasets (NYUv2-raw and MP3D-mesh-ref), and against outputs after refinement using our Mirror3DNet module.…”

Section: Quantitative Evaluationmentioning

confidence: 99%

See 2 more Smart Citations

Mirror3D: Depth Refinement for Mirror Surfaces

Tan

Lin

Chang

et al. 2021

Preprint

View full text Add to dashboard Cite

Input image + mirror mask Raw depth Original point cloud Refined depth Refined point cloud Figure 1: We present the task of 3D mirror plane prediction and depth refinement. First, we annotate several popular RGBD datasets (Matterport3D [6], ScanNet [7], NYUv2 [32]) with 3D mirror planes. Our benchmarks show that both existing RGBD dataset 'ground truth' raw depth data, and state-of-the-art depth estimation and depth completion methods exhibit dramatic errors on mirror surfaces. We propose an architecture for 3D mirror plane estimation that refines depth estimates and produces more reliable reconstructions (compare left and right depth and point cloud pairs from NYUv2 [32] dataset).

show abstract

Section: Additional Quantitative Resultsmentioning

confidence: 99%

Section: D Plane Detection and Plane Reconstructionmentioning

confidence: 99%

Section: Quantitative Evaluationmentioning

confidence: 99%

See 1 more Smart Citation

Mirror3D: Depth Refinement for Mirror Surfaces

Tan

Lin

Chang

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…The decoder of the Depth Completion module consists of spatially-adaptive denormalization (SPADE) blocks, first introduced in [46]. Our usage of SPADE in the encoder-decoder Depth Completion module is a variant of [47]. This module enables us to learn spatially-dependent scale and bias for decoder feature maps, which helps reduce the domain shift between RGB and depth, as introduced by the empty depths on the depth map.…”

Section: Depth Completionmentioning

confidence: 99%

“…Gridding Loss bypasses the un-orderedness of point clouds and is evaluated on the 3D grid. The depth completion network is trained using log L 1 pair-wise loss which forces the pairs of pixels in the predicted depth to regress to similar values as the corresponding pairs in the ground truth depth [47]. Let G describe the set of pixels where the ground truth depth is non-zero, i and j are the pixel pairs, and y and y * denote the ground truth and predicted depths, respectively.…”

Section: Loss Functionmentioning

confidence: 99%

Seeing Glass: Joint Point Cloud and Depth Completion for Transparent Objects

Xu,

Wang,

Eppel

et al. 2021

Preprint

View full text Add to dashboard Cite

The basis of many object manipulation algorithms is RGB-D input. Yet, commodity RGB-D sensors can only provide distorted depth maps for a wide range of transparent objects due light refraction and absorption. To tackle the perception challenges posed by transparent objects, we propose TranspareNet, a joint point cloud and depth completion method, with the ability to complete the depth of transparent objects in cluttered and complex scenes, even with partially filled fluid contents within the vessels. To address the shortcomings of existing transparent object data collection schemes in literature, we also propose an automated dataset creation workflow that consists of robot-controlled image collection and vision-based automatic annotation. Through this automated workflow, we created Toronto Transparent Objects Depth Dataset (TODD), which consists of nearly 15000 RGB-D images. Our experimental evaluation demonstrates that TranspareNet outperforms existing state-of-the-art depth completion methods on multiple datasets, including Clear-Grasp, and that it also handles cluttered scenes when trained on TODD. Code and dataset will be released at https://www.pair.toronto.edu/TranspareNet/

show abstract