Dense pixel matching is important for many computer vision tasks such as disparity and flow estimation. We present a robust, unified descriptor network that considers a large context region with high spatial variance. Our network has a very large receptive field and avoids striding layers to maintain spatial resolution. These properties are achieved by creating a novel neural network layer that consists of multiple, parallel, stacked dilated convolutions (SDC). Several of these layers are combined to form our SDC descriptor network. In our experiments, we show that our SDC features outperform state-of-the-art feature descriptors in terms of accuracy and robustness. In addition, we demonstrate the superior performance of SDC in state-of-the-art stereo matching, optical flow and scene flow algorithms on several famous public benchmarks. SDC Layer 1Input: M×N×3 Convolutions: 4 Sizes: [5, 5, 5, 5] Kernels: [16, 16, 16, 16] Dilations: [1, 2, 3, 4] Output: M×N×64 Color ImageFeature Map SDC Layer 2Input: M×N×64 Figure 2: Our SDC feature network. It consists of 5 SDC blocks with varying number of output channels. The final feature vectors are normalized to unit range pixel-wise. Convolution Dilation rate: 1 Convolution Dilation rate: 2 Convolution Dilation rate: 3 Convolution Dilation rate: 4 SDC Layer Parallel convolutions: 4
In order to improve the performance of correlation-based disparity computation of stereo vision algorithms, standard methods need to choose in advance the value of the maximum disparity (MD). This value corresponds to the maximum displacement of the projection of a physical point expected between the two images. It generally depends on the motion model, the camera intrinsic parameters and on the depths of the observed scene.In this paper, we show that there is no optimal MD value that minimizes the matching errors in all image regions simultaneously and we propose a novel approach of the disparity computation that does not rely on any a priori MD. Two variants of this approach will be presented. When compared to traditional correlation-based methods, we show that our approach improves not only the accuracy of the results but also the efficiency of the algorithm. A local energy minimization is also proposed for fast refinement of the results.An extensive comparative study with ground truth is carried out on classical stereo images and the results show that the proposed method clearly gives more accurate results and it is two times faster than the fastest possible implementation of traditional correlation-based methods.
Abstract. High precision ground truth data is a very important factor for the development and evaluation of computer vision algorithms and especially for advanced driver assistance systems. Unfortunately, some types of data, like accurate optical flow and depth as well as pixel-wise semantic annotations are very difficult to obtain. In order to address this problem, in this paper we present a new framework for the generation of high quality synthetic camera images, depth and optical flow maps and pixel-wise semantic annotations. The framework is based on a realistic driving simulator called VDrift [1], which allows us to create traffic scenarios very similar to those in real life. We show how we can use the proposed framework to generate an extensive dataset for the task of multi-class image segmentation. We use the dataset to train a pairwise CRF model and to analyze the effects of using various combinations of features in different image modalities.
State-of-the-art scene flow algorithms pursue the conflicting targets of accuracy, run time, and robustness. With the successful concept of pixel-wise matching and sparse-to-dense interpolation, we shift the operating point in this field of conflicts towards universality and speed. Avoiding strong assumptions on the domain or the problem yields a more robust algorithm. This algorithm is fast because we avoid explicit regularization during matching, which allows an efficient computation. Using image information from multiple time steps and explicit visibility prediction based on previous results, we achieve competitive performances on different data sets. Our contributions and results are evaluated in comparative experiments. Overall, we present an accurate scene flow algorithm that is faster and more generic than any individual benchmark leader.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.