Abstract. Ambitious driver assistance for complex urban scenarios demands a complete awareness of the situation, including all moving and stationary objects that limit the free space. Recent progress in real-time dense stereo vision provides precise depth information for nearly every pixel of an image. This rises new questions: How can one efficiently analyze half a million disparity values of next generation imagers? And how can one find all relevant obstacles in this huge amount of data in real-time? In this paper we build a medium-level representation named "stixel-world". It takes into account that the free space in front of vehicles is limited by objects with almost vertical surfaces. These surfaces are approximated by adjacent rectangular sticks of a certain width and height. The stixel-world turns out to be a compact but flexible representation of the three-dimensional traffic situation that can be used as the common basis for the scene understanding tasks of driver assistance and autonomous systems.
Abstract-One of the fundamental requirements of an autonomous vehicle is the ability to determine its location on a map. Frequently, solutions to this localization problem rely on GPS information or use expensive three dimensional (3D) sensors. In this paper, we describe a method for long-term vehicle localization based on visual features alone. Our approach utilizes a combination of topological and metric mapping, which we call topometric localization, to encode the coarse topology of the route as well as detailed metric information required for accurate localization. A topometric map is created by driving the route once and recording a database of visual features. The vehicle then localizes by matching features to this database at runtime. Since individual feature matches are unreliable, we employ a discrete Bayes filter to estimate the most likely vehicle position using evidence from a sequence of images along the route. We illustrate the approach using an 8.8 km route through an urban and suburban environment. The method achieves an average localization error of 2.7 m over this route, with isolated worst case errors on the order of 10 m.
Abstract-Autonomous vehicles must be capable of localizing even in GPS denied situations. In this paper, we propose a realtime method to localize a vehicle along a route using visual imagery or range information. Our approach is an implementation of topometric localization, which combines the robustness of topological localization with the geometric accuracy of metric methods. We construct a map by navigating the route using a GPS-equipped vehicle and building a compact database of simple visual and 3D features. We then localize using a Bayesian filter to match sequences of visual or range measurements to the database. The algorithm is reliable across wide environmental changes, including lighting differences, seasonal variations, and occlusions, achieving an average localization accuracy of 1 m over an 8 km route. The method converges correctly even with wrong initial position estimates solving the kidnapped robot problem.
This paper presents a novel stereo-based visual odometry approach that provides state-of-the-art results in real time, both indoors and outdoors. Our proposed method follows the procedure of computing optical flow and stereo disparity to minimize the re-projection error of tracked feature points. However, instead of following the traditional approach of performing this task using only consecutive frames, we propose a novel and computationally inexpensive technique that uses the whole history of the tracked feature points to compute the motion of the camera. In our technique, which we call multi-frame feature integration, the features measured and tracked over all past frames are integrated into a single, improved estimate. An augmented feature set, composed of the improved estimates, is added to the optimization algorithm, improving the accuracy of the computed motion and reducing ego-motion drift. Experimental results show that the proposed approach reduces pose error by up to 65% with a negligible additional computational cost of 3.8%. Furthermore, our algorithm outperforms all other known methods on the KITTI Vision Benchmark data set.
Figure 1: Left: Our xR-EgoPose Dataset setup: (a) external camera viewpoint showing a synthetic character wearing the headset; (b) example of photorealistic image rendered from the egocentric camera perspective; (c) 2D and (d) 3D poses estimated with our algorithm. Right: results on real images; (e) real image acquired with our HMD-mounted camera with predicted 2D heatmaps; (f) estimated 3D pose, showing good generalization to real images. AbstractWe present a new solution to egocentric 3D body pose estimation from monocular images captured from a downward looking fish-eye camera installed on the rim of a head mounted virtual reality device. This unusual viewpoint, just 2 cm. away from the user's face, leads to images with unique visual appearance, characterized by severe self-occlusions and strong perspective distortions that result in a drastic difference in resolution between lower and upper body. Our contribution is two-fold. Firstly, we propose a new encoderdecoder architecture with a novel dual branch decoder designed specifically to account for the varying uncertainty in the 2D joint locations. Our quantitative evaluation, both on synthetic and real-world datasets, shows that our strategy leads to substantial improvements in accuracy over state of the art egocentric pose estimation approaches. Our second contribution is a new large-scale photorealistic synthetic dataset -xR-EgoPose -offering 383K frames of high quality renderings of people with a diversity of skin tones, body shapes, clothing, in a variety of backgrounds and lighting conditions, performing a range of actions. Our experiments show that the high variability in our new synthetic training corpus leads to good generalization to real world footage and to state of the art results on real world datasets with ground truth. Moreover, an evaluation on the Human3.6M benchmark shows that the performance of our method is on par with top performing approaches on the more classic problem of 3D human pose from a third person viewpoint.
Abstract-The fast and accurate computation of surface normals from a point cloud is a critical step for many 3D robotics and automotive problems, including terrain estimation, mapping, navigation, object segmentation, and object recognition. To obtain the tangent plane to the surface at a point, the traditional approach applies total least squares to its small neighborhood. However, least squares becomes computationally very expensive when applied to the millions of measurements per second that current range sensors can generate. We reformulate the traditional least squares solution to allow the fast computation of surface normals, and propose a new approach that obtains the normals by calculating the derivatives of the surface from a spherical range image. Furthermore, we show that the traditional least squares problem is very sensitive to range noise and must be normalized to obtain accurate results. Experimental results with synthetic and real data demonstrate that our proposed method is not only more efficient by up to two orders of magnitude, but provides better accuracy than the traditional least squares for practical neighborhood sizes.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2023 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.