Abstract:We present an energy-based approach to visual odometry from RGB-D images of a Microsoft Kinect camera. To this end we propose an energy function which aims at finding the best rigid body motion to map one RGB-D image into another one, assuming a static scene filmed by a moving camera. We then propose a linearization of the energy function which leads to a 6 × 6 normal equation for the twist coordinates representing the rigid body motion. To allow for larger motions, we solve this equation in a coarse-to-fine s… Show more
“…The sequences are labeled with 6-DOF ground truth from a motion capture system having 10 cameras. Six research publications about evaluating ego-motion estimation and SLAM over TUM Benchmark dataset are [21,38,45,84,86,87].…”
RGB-D data has turned out to be a very useful representation of an indoor scene for solving fundamental computer vision problems. It takes the advantages of the color image that provides appearance information of an object and also the depth image that is immune to the variations in color, illumination, rotation angle and scale. With the invention of the low-cost Microsoft Kinect sensor, which was initially used for gaming and later became a popular device for computer vision, high quality RGB-D data can be acquired easily. In recent years, more and more RGB-D image/video datasets dedicated to various applications have become available, which are of great importance to benchmark the stateof-the-art. In this paper, we systematically survey popular RGB-D datasets for different applications including object recognition, scene classification, hand gesture recognition, 3D-simultaneous localization and mapping, and pose estimation. We provide the insights into the characteristics of each important dataset, and compare the popularity and the difficulty of those datasets. Overall, the main goal of this survey is to give a comprehensive description about the available RGB-D datasets and thus to guide researchers in the selection of suitable datasets for evaluating their algorithms.
“…The sequences are labeled with 6-DOF ground truth from a motion capture system having 10 cameras. Six research publications about evaluating ego-motion estimation and SLAM over TUM Benchmark dataset are [21,38,45,84,86,87].…”
RGB-D data has turned out to be a very useful representation of an indoor scene for solving fundamental computer vision problems. It takes the advantages of the color image that provides appearance information of an object and also the depth image that is immune to the variations in color, illumination, rotation angle and scale. With the invention of the low-cost Microsoft Kinect sensor, which was initially used for gaming and later became a popular device for computer vision, high quality RGB-D data can be acquired easily. In recent years, more and more RGB-D image/video datasets dedicated to various applications have become available, which are of great importance to benchmark the stateof-the-art. In this paper, we systematically survey popular RGB-D datasets for different applications including object recognition, scene classification, hand gesture recognition, 3D-simultaneous localization and mapping, and pose estimation. We provide the insights into the characteristics of each important dataset, and compare the popularity and the difficulty of those datasets. Overall, the main goal of this survey is to give a comprehensive description about the available RGB-D datasets and thus to guide researchers in the selection of suitable datasets for evaluating their algorithms.
“…However, they are not able to properly cope with large displacements between consecutive frames. In [13], it has been experimentally shown that the performance of the method degrades as the frame interval increases, which is equivalent to decreasing the frame rate of the acquisition, or increasing the sensor velocity.…”
Section: Introductionmentioning
confidence: 99%
“…Recently, odometry methods that take advantage of the depth and color information provided by RGB-D sensors have been developed [9,13]. They run in real-time and provide accurate estimations for high frame rate acquisitions and moderate sensor velocity.…”
Odometry consists in using data from a moving sensor to estimate change in position over time. It is a crucial step for several applications in robotics and computer vision. This paper presents a novel approach for estimating the relative motion between successive RGB-D frames that uses plane-primitives instead of point features. The planes in the scene are extracted and the motion estimation is cast as a plane-to-plane registration problem with a closed-form solution. Point features are only extracted in the cases where the plane surface configuration is insufficient to determine motion with no ambiguity. The initial estimate is refined in a photo-geometric optimization step that takes full advantage of the plane detection and simultaneous availability of depth and visual appearance cues. Extensive experiments show that our plane-based approach is as accurate as state-of-the-art point-based approaches when the camera displacement is small, and significantly outperforms them in case of wide-baseline and/or dynamic foreground.
“…Accurate 3D reconstruction and mapping has been addressed as a vital topic and is playing a prominent role in such important research domains as 3D shape acquisition and modelling, surface generation and texturing, localization and robot vision (Engelhard et al, 2011;Newcombe et al, 2011a;Steinbrucker et al, 2011;Whelan et al, 2012a;Whelan et al, 2013). During recent years, the advent of powerful generalpurpose GPUs has resulted in the first generation of real-time 3D-reconstruction applications which use depth data obtained from a low-cost depth Kinect sensor (PrimeSense; Kinect; Asus) to generate 3D geometry for relatively large and complex indoor environments (Newcombe et al, 2011b;Izadi et al, 2011;Bondarev et al, 2013;Whelan et al, 2012b;Whelan et al, 2012a;Whelan et al, 2013).…”
Abstract:In this paper, we report on experiments on deployment of an extended distance-aware KinFu algorithm, designed to generate 3D model from Kinect data, onto depth frames extracted from stereo camera data. The proposed idea allows to suppress the Kinect usage limitation for outdoor sensing due to the IR interference with sunlight. Besides this, exploiting the stereo data enables a hybrid 3D reconstruction system capable of switching between the Kinect depth frames and stereo data depending on the quality and quantity of the 3D and visual features on a scene. While the nature of the stereo sensing and the Kinect depth sensing is completely different, the stereo camera and the Kinect show similar sensitivity to distance capturing. We have evaluated the stereo-based 3D reconstruction with the extended KinFu algorithm with the following distance aware weighting strategies: (a) weight definition to prioritize importance of the sensed data depending on its accuracy, and (b) model updating to decide about the level of influence of the new data on the existing 3D model. The qualitative comparison of the resulting outdoor 3D models shows higher accuracy and smoothness of models obtained by introduced distance-aware strategies. The quantitative analysis reveals that applying the proposed weighting strategies onto stereo datasets enables to increase robustness of the pose-estimation algorithm and its endurance by factor of two.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.