This paper presents a calibration method for eye-in-hand systems in order to estimate the hand-eye and the robot-world transformations. The estimation takes place in terms of a parametrization of a stochastic model. In order to perform optimally, a metric on the group of the rigid transformations SE(3) and the corresponding error model are proposed for nonlinear optimization. This novel metric works well with both common formulations AX=XB and AX=ZB, and makes use of them in accordance with the nature of the problem. The metric also adapts itself to the system precision characteristics. The method is compared in performance to earlier approaches.
In many applications of advanced robotic manipulation, six degrees of freedom (6DoF) object pose estimates are continuously required. In this work, we develop a multimodality tracker that fuses information from visual appearance and geometry to estimate object poses. The algorithm extends our previous method ICG, which uses geometry, to additionally consider surface appearance. In general, object surfaces contain local characteristics from text, graphics, and patterns, as well as global differences from distinct materials and colors. To incorporate this visual information, two modalities are developed. For local characteristics, keypoint features are used to minimize distances between points from keyframes and the current image. For global differences, a novel region approach is developed that considers multiple regions on the object surface. In addition, it allows the modeling of external geometries. Experiments on the YCB-Video and OPT datasets demonstrate that our approach ICG+ performs best on both datasets, outperforming both conventional and deep learningbased methods. At the same time, the algorithm is highly efficient and runs at more than 300 Hz. The source code of our tracker is publicly available.
Abstract-In the context of 3-D scene modeling, this work aims at the accurate estimation of the pose of a close-range 3-D modeling device, in real-time and passively from its own images. This novel development makes it possible to abandon using inconvenient, expensive external positioning systems. The approach comprises an ego-motion algorithm tracking natural, distinctive features, concurrently with customary 3-D modeling of the scene. The use of stereo vision, an inertial measurement unit, and robust cost functions for pose estimation further increases performance. Demonstrations and abundant video material validate the approach.
Region-based methods have become increasingly popular for model-based, monocular 3D tracking of texture-less objects in cluttered scenes. However, while they achieve state-of-the-art results, most methods are computationally expensive, requiring significant resources to run in real-time. In the following, we build on our previous work and develop SRT3D, a sparse region-based approach to 3D object tracking that bridges this gap in efficiency. Our method considers image information sparsely along so-called correspondence lines that model the probability of the object’s contour location. We thereby improve on the current state of the art and introduce smoothed step functions that consider a defined global and local uncertainty. For the resulting probabilistic formulation, a thorough analysis is provided. Finally, we use a pre-rendered sparse viewpoint model to create a joint posterior probability for the object pose. The function is maximized using second-order Newton optimization with Tikhonov regularization. During the pose estimation, we differentiate between global and local optimization, using a novel approximation for the first-order derivative employed in the Newton method. In multiple experiments, we demonstrate that the resulting algorithm improves the current state of the art both in terms of runtime and quality, performing particularly well for noisy and cluttered images encountered in the real world.
Fig. 1. Top row: ORB-SLAM2 [1] tracks on KITTI [2] images. Middle row: ORB-SLAM2 tracks with DOT segmentation masks, which differentiate between moving and static objects. Bottom row: ORB-SLAM2 tracks using Detectron2 [3] segmentation masks, encoding all potentially dynamic objects. Note how DOT segments out actually moving objects (e.g., moving cars), while keeping the static ones (e.g., parked cars).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.