Dense Depth Posterior (DDP) From Single Image and Sparse Range

Yang, Yanchao; Wong, Alex; Soatto, Stefano

doi:10.1109/cvpr.2019.00347

Cited by 114 publications

(143 citation statements)

References 44 publications

Supporting

Mentioning

139

Contrasting

Order By: Relevance

“…Compared to [14] and [28], our VGG11 model has a 76.1% and 61.5% reduction in the encoder parameters and 65.1% and 48.4% overall, respectively. Our VGG8 model has a 89.9% and 83.9% reduction in the encoder and 80% and 66% overall compared to that of [14] and [28], respectively. Despite having fewer parameters, our method outperforms that of [14,28].…”

Section: Network Architecturementioning

confidence: 93%

“…This is in contrast to other unsupervised methods [14] (who follows early fusion and concatenates features from the two branches after the first convolution) and [28] (late fusion) -both of whom use ResNet34 encoders with ≈ 23.8M and ≈ 14.8M parameters, respectively. Both [14,28] employ the same decoder with ≈ 4M parameterstotaling to ≈ 27.8M and ≈ 18.8M parameters, respectively. Compared to [14] and [28], our VGG11 model has a 76.1% and 61.5% reduction in the encoder parameters and 65.1% and 48.4% overall, respectively.…”

Section: Network Architecturementioning

confidence: 98%

“…[22] used a local smoothness term, but instead minimized the photometric error between rectified stereo-pairs where pose is known. [28] also leveraged stereo pairs and a more sophisticated photometric loss (SSIM [27]), and replaced the generic smoothness term with a conditional prior to measure compatibility between the prediction and a learned depth model obtained by training a separate network on ground-truth depth. This method can be considered semi-unsupervised, and requires ground truth for training the prior.…”

Section: Related Workmentioning

confidence: 99%

“…We propose two encoder-decoder architectures with skip connections following the late fusion paradigm [9,28]. Each encoder has an image branch and a depth branch -the image branch contains 75% of the total features in the encoder and the depth branch 25%.…”

Section: Network Architecturementioning

confidence: 99%

“…Both [14,28] employ the same decoder with ≈ 4M parameterstotaling to ≈ 27.8M and ≈ 18.8M parameters, respectively. Compared to [14] and [28], our VGG11 model has a 76.1% and 61.5% reduction in the encoder parameters and 65.1% and 48.4% overall, respectively. Our VGG8 model has a 89.9% and 83.9% reduction in the encoder and 80% and 66% overall compared to that of [14] and [28], respectively.…”

Section: Network Architecturementioning

confidence: 99%

See 4 more Smart Citations

Unsupervised Depth Completion From Visual Inertial Odometry

Wong

Fei

Tsuei

et al. 2020

IEEE Robot. Autom. Lett.

Self Cite

188

View full text Add to dashboard Cite

We describe a method to infer dense depth from camera motion and sparse depth as estimated using a visualinertial odometry system. Unlike other scenarios using point clouds from lidar or structured light sensors, we have few hundreds to few thousand points, insufficient to inform the topology of the scene. Our method first constructs a piecewise planar scaffolding of the scene, and then uses it to infer dense depth using the image along with the sparse points. We use a predictive cross-modal criterion, akin to "self-supervision," measuring photometric consistency across time, forward-backward pose consistency, and geometric compatibility with the sparse point cloud. We also launch the first visual-inertial + depth dataset, which we hope will foster additional exploration into combining the complementary strengths of visual and inertial sensors. To compare our method to prior work, we adopt the unsupervised KITTI depth completion benchmark, and show stateof-the-art performance on it.

show abstract