Abstract:In this work, we propose a learning-based method to denoise and refine disparity maps of a given stereo method. The proposed variational network arises naturally from unrolling the iterates of a proximal gradient method applied to a variational energy defined in a joint disparity, color, and confidence image space. Our method allows to learn a robust collaborative regularizer leveraging the joint statistics of the color image, the confidence map and the disparity map. Due to the variational structure of our me… Show more
“…This paper extends the conference paper (Knöbelreiter and Pock 2019), where we additionally study (i) a model with shared parameters over the iterations, (ii) a comparison with the recent lightweight StereoNet refinement module (Khamis et al 2018) and (iii) a new section, where we analyze the VN. To this end, we show how to compute eigen disparity maps that reveal structural properties of the learned regularizer and analyze the refined confidences in order to show the increased reliability of the confidences predicted by our model.…”
In this work, we propose a learning-based method to denoise and refine disparity maps. The proposed variational network arises naturally from unrolling the iterates of a proximal gradient method applied to a variational energy defined in a joint disparity, color, and confidence image space. Our method allows to learn a robust collaborative regularizer leveraging the joint statistics of the color image, the confidence map and the disparity map. Due to the variational structure of our method, the individual steps can be easily visualized, thus enabling interpretability of the method. We can therefore provide interesting insights into how our method refines and denoises disparity maps. To this end, we can visualize and interpret the learned filters and activation functions and prove the increased reliability of the predicted pixel-wise confidence maps. Furthermore, the optimization based structure of our refinement module allows us to compute eigen disparity maps, which reveal structural properties of our refinement module. The efficiency of our method is demonstrated on the publicly available stereo benchmarks Middlebury 2014 and Kitti 2015.
“…This paper extends the conference paper (Knöbelreiter and Pock 2019), where we additionally study (i) a model with shared parameters over the iterations, (ii) a comparison with the recent lightweight StereoNet refinement module (Khamis et al 2018) and (iii) a new section, where we analyze the VN. To this end, we show how to compute eigen disparity maps that reveal structural properties of the learned regularizer and analyze the refined confidences in order to show the increased reliability of the confidences predicted by our model.…”
In this work, we propose a learning-based method to denoise and refine disparity maps. The proposed variational network arises naturally from unrolling the iterates of a proximal gradient method applied to a variational energy defined in a joint disparity, color, and confidence image space. Our method allows to learn a robust collaborative regularizer leveraging the joint statistics of the color image, the confidence map and the disparity map. Due to the variational structure of our method, the individual steps can be easily visualized, thus enabling interpretability of the method. We can therefore provide interesting insights into how our method refines and denoises disparity maps. To this end, we can visualize and interpret the learned filters and activation functions and prove the increased reliability of the predicted pixel-wise confidence maps. Furthermore, the optimization based structure of our refinement module allows us to compute eigen disparity maps, which reveal structural properties of our refinement module. The efficiency of our method is demonstrated on the publicly available stereo benchmarks Middlebury 2014 and Kitti 2015.
“…There are only few filtering approaches that jointly consider guidance data as well as probabilities. Different filtering methods have been applied to refine semantic segmentations [50], optical flow [43,53], and especially depth [17,38,45]. However, these approaches are task-specific, tailored to certain filtering methods, and/or rely on time-intensive iterative approaches.…”
Encoder-decoder networks have found widespread use in various dense prediction tasks. However, the strong reduction of spatial resolution in the encoder leads to a loss of location information as well as boundary artifacts. To address this, image-adaptive post-processing methods have shown beneficial by leveraging the high-resolution input image(s) as guidance data. We extend such approaches by considering an important orthogonal source of information: the network's confidence in its own predictions. We introduce probabilistic pixel-adaptive convolutions (PPACs), which not only depend on image guidance data for filtering, but also respect the reliability of per-pixel predictions. As such, PPACs allow for image-adaptive smoothing and simultaneously propagating pixels of high confidence into less reliable regions, while respecting object boundaries. We demonstrate their utility in refinement networks for optical flow and semantic segmentation, where PPACs lead to a clear reduction in boundary artifacts. Moreover, our proposed refinement step is able to substantially improve the accuracy on various widely used benchmarks.
“…However, the objective is fully handcrafted. Knoblereiter and Pock recently proposed a refinement scheme where the regularizer in the optimization objective is trained using ground truth disparity maps [25]. Their model learns to jointly reason about image color, stereo matching confidence and disparity.…”
Recent work has shown that the structure of convolutional neural networks (CNNs) induces a strong prior that favors natural images. This prior, known as a deep image prior (DIP), is an effective regularizer in inverse problems such as image denoising and inpainting. We extend the DIP concept to apply to depth images. Given color images and noisy and incomplete target depth maps, we optimize a randomlyinitialized CNN model to reconstruct an depth map restored by virtue of using the CNN network structure as a prior combined with a viewconstrained photo-consistency loss, which is computed using images from a geometrically calibrated camera from nearby viewpoints. We apply this deep depth prior for inpainting and refining incomplete and noisy depth maps within both binocular and multi-view stereo pipelines. Our quantitative and qualitative evaluation shows that our refined depth maps are more accurate and complete, and after fusion, produces dense 3D models of higher quality.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.