Abstract-In this paper, we present a monocular visualinertial odometry algorithm which, by directly using pixel intensity errors of image patches, achieves accurate tracking performance while exhibiting a very high level of robustness. After detection, the tracking of the multilevel patch features is closely coupled to the underlying extended Kalman filter (EKF) by directly using the intensity errors as innovation term during the update step. We follow a purely robocentric approach where the location of 3D landmarks are always estimated with respect to the current camera pose. Furthermore, we decompose landmark positions into a bearing vector and a distance parametrization whereby we employ a minimal representation of differences on a corresponding σ-Algebra in order to achieve better consistency and to improve the computational performance. Due to the robocentric, inversedistance landmark parametrization, the framework does not require any initialization procedure, leading to a truly powerup-and-go state estimation system. The presented approach is successfully evaluated in a set of highly dynamic hand-held experiments as well as directly employed in the control loop of a multirotor unmanned aerial vehicle (UAV).
This paper introduces ANYmal, a quadrupedal robot that features outstanding mobility and dynamic motion capability. Thanks to novel, compliant joint modules with integrated electronics, the 30 kg, 0.5 m tall robotic dog is torque controllable and very robust against impulsive loads during running or jumping. The presented machine was designed with a focus on outdoor suitability, simple maintenance, and user-friendly handling to enable future operation in real world scenarios. Performance tests with the joint actuators indicated a torque control bandwidth of more than 70 Hz, high disturbance rejection capability, as well as impact robustness when moving with maximal velocity. It is demonstrated in a series of experiments that ANYmal can execute walking gaits, dynamically trot at moderate speed, and is able to perform special maneuvers to stand up or crawl very steep stairs. Detailed measurements unveil that even full-speed running requires less than 280 W, resulting in an autonomy of more than 2 h.
This paper presents a visual-inertial odometry framework which tightly fuses inertial measurements with visual data from one or more cameras, by means of an iterated extended Kalman filter (IEKF). By employing image patches as landmark descriptors, a photometric error is derived, which is directly integrated as an innovation term in the filter update step. Consequently, the data association is an inherent part of the estimation process and no additional feature extraction or matching processes are required. Furthermore, it enables the tracking of non-corner shaped features, such as lines, and thereby increases the set of possible landmarks. The filter state is formulated in a fully robocentric fashion, which reduces errors related to nonlinearities. This also includes partitioning of a landmark's location estimate into a bearing vector and distance and thereby allows an undelayed initialization of landmarks. Overall, this results in a compact approach which exhibits a high level of robustness with respect to low scene texture and motion blur. Furthermore, there is no time-consuming initialization procedure and pose estimates are available starting at the second image frame. We test the filter on different real datasets and compare it to other state-of-the-art visual-inertial frameworks. The experimental results show that robust localization with high accuracy can be achieved with this filterbased framework.
This paper introduces a state estimation framework for legged robots that allows estimating the full pose of the robot without making any assumptions about the geometrical structure of its environment. This is achieved by means of an Observability Constrained Extended Kalman Filter that fuses kinematic encoder data with on-board IMU measurements. By including the absolute position of all footholds into the filter state, simple model equations can be formulated which accurately capture the uncertainties associated with the intermittent ground contacts. The resulting filter simultaneously estimates the position of all footholds and the pose of the main body. In the algorithmic formulation, special attention is paid to the consistency of the linearized filter: it maintains the same observability properties as the nonlinear system, which is a prerequisite for accurate state estimation. The presented approach is implemented in simulation and validated experimentally on an actual quadrupedal robot.
The representation of geometry in real-time 3D perception systems continues to be a critical research issue. Dense maps capture complete surface shape and can be augmented with semantic labels, but their high dimensionality makes them computationally costly to store and process, and unsuitable for rigorous probabilistic inference. Sparse feature-based representations avoid these problems, but capture only partial scene information and are mainly useful for localisation only.We present a new compact but dense representation of scene geometry which is conditioned on the intensity data from a single image and generated from a code consisting of a small number of parameters. We are inspired by work both on learned depth from images, and auto-encoders. Our approach is suitable for use in a keyframe-based monocular dense SLAM system: While each keyframe with a code can produce a depth map, the code can be optimised efficiently jointly with pose variables and together with the codes of overlapping keyframes to attain global consistency. Conditioning the depth map on the image allows the code to only represent aspects of the local geometry which cannot directly be predicted from the image. We explain how to learn our code representation, and demonstrate its advantageous properties in monocular SLAM.
No abstract
With the introduction of the Microsoft Kinect for Windows v2 (Kinect v2), an exciting new sensor is available to robotics and computer vision researchers. Similar to the original Kinect, the sensor is capable of acquiring accurate depth images at high rates. This is useful for robot navigation as dense and robust maps of the environment can be created. Opposed to the original Kinect working with the structured light technology, the Kinect v2 is based on the time-of-flight measurement principle and might also be used outdoors in sunlight. In this paper, we evaluate the application of the Kinect v2 depth sensor for mobile robot navigation. The results of calibrating the intrinsic camera parameters are presented and the minimal range of the depth sensor is examined. We analyze the data quality of the measurements for indoors and outdoors in overcast and direct sunlight situations. To this end, we introduce empirically derived noise models for the Kinect v2 sensor in both axial and lateral directions. The noise models take the measurement distance, the angle of the observed surface, and the sunlight incidence angle into account. These models can be used in post-processing to filter the Kinect v2 depth images for a variety of applications.
We propose a new multi-instance dynamic RGB-D SLAM system using an object-level octree-based volumetric representation. It can provide robust camera tracking in dynamic environments and at the same time, continuously estimate geometric, semantic, and motion properties for arbitrary objects in the scene. For each incoming frame, we perform instance segmentation to detect objects and refine mask boundaries using geometric and motion information. Meanwhile, we estimate the pose of each existing moving object using an object-oriented tracking method and robustly track the camera pose against the static scene. Based on the estimated camera pose and object poses, we associate segmented masks with existing models and incrementally fuse corresponding colour, depth, semantic, and foreground object probabilities into each object model. In contrast to existing approaches, our system is the first system to generate an object-level dynamic volumetric map from a single RGB-D camera, which can be used directly for robotic tasks. Our method can run at 2-3 Hz on a CPU, excluding the instance segmentation part. We demonstrate its effectiveness by quantitatively and qualitatively testing it on both synthetic and real-world sequences.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.