Whenever a sensor is mounted on a robot hand, it is important to know the relationship between the sensor and the hand. The problem of determining this relationship is referred to as the hand-eye calibration problem. Hand-eye calibration is impor tant in at least two types of tasks: (1) map sensor centered measurements into the robot workspace frame and (2) tasks allowing the robot to precisely move the sensor. In the past some solutions were proposed, particularly in the case of the sensor being a television camera. With almost no exception, all existing solutions attempt to solve a homogeneous matrix equation of the form AX = X B. This article has the following main contributions. First we show that there are two possible formulations of the hand-eye calibration problem. One formu lation is the classic one just mentioned. A second formulation takes the form of the following homogeneous matrix equation: MY = M'YB. The advantage of the latter formulation is that the extrinsic and intrinsic parameters of the camera need not be made explicit. Indeed, this formulation directly uses the 3 x4 perspective matrices (M andM' ) associated with two positions of the camera with respect to the calibration frame. Moreover, this formulation together with the classic one covers a wider range of camera-based sensors to be calibrated with respect to the robot hand: single scan-line cameras, stereo heads, range finders, etc. Second, we develop a common mathematical framework to solve for the hand-eye calibration problem using either of the two formulations. We represent rotation by a unit quaternion and present two methods: (1) a closed-form solution for solving for rotation using unit quaternions and then solving for translation and (2) a nonlinear technique for simultane ously solving for rotation and translation. Third, we perform a stability analysis both for our two methods and for the lin ear method developed by Tsai and Lenz (1989). This analysis allows the comparison of the three methods. In light of this comparison, the nonlinear optimization method, which solves for rotation and translation simultaneously, seems to be the most robust one with respect to noise and measurement errors.
This paper addresses the issue of matching rigid and articulated shapes through probabilistic point registration. The problem is recast into a missing data framework where unknown correspondences are handled via mixture models. Adopting a maximum likelihood principle, we introduce an innovative EM-like algorithm, namely, the Expectation Conditional Maximization for Point Registration (ECMPR) algorithm. The algorithm allows the use of general covariance matrices for the mixture model components and improves over the isotropic covariance case. We analyze in detail the associated consequences in terms of estimation of the registration parameters, and propose an optimal method for estimating the rotational and translational parameters based on semidefinite positive relaxation. We extend rigid registration to articulated registration. Robustness is ensured by detecting and rejecting outliers through the addition of a uniform component to the Gaussian mixture model at hand. We provide an in-depth analysis of our method and compare it both theoretically and experimentally with other robust methods for point registration.
International audienceRecent advances on human motion analysis have made the extraction of human skeleton structure feasible, even from single depth images. This structure has been proven quite informative for discriminating actions in a recognition scenario. In this context, we propose a local skeleton descriptor that encodes the relative position of joint quadruples. Such a coding implies a similarity normalisation transform that leads to a compact (6D) view-invariant skeletal feature, referred to as skeletal quad. Further, the use of a Fisher kernel representation is suggested to describe the skeletal quads contained in a (sub)action. A Gaussian mixture model is learnt from training data, so that the generation of any set of quads is encoded by its Fisher vector. Finally, a multi-level representation of Fisher vectors leads to an action description that roughly carries the order of sub-action within each action sequence. Efficient classification is here achieved by linear SVMs. The proposed action representation is tested on widely used datasets, MSRAction3D and HDM05. The experimental evaluation shows that the proposed method outperforms state-of-the-art algorithms that rely only on joints, while it competes with methods that combine joints with extra cues
In this paper we address the problem of enhancing speech signals in noisy mixtures using a source separation approach. We explore the use of neural networks as an alternative to a popular speech variance model based on supervised non-negative matrix factorization (NMF). More precisely, we use a variational autoencoder as a speaker-independent supervised generative speech model, highlighting the conceptual similarities that this approach shares with its NMF-based counterpart. In order to be free of generalization issues regarding the noisy recording environments, we follow the approach of having a supervised model only for the target speech signal, the noise model being based on unsupervised NMF. We develop a Monte Carlo expectation-maximization algorithm for inferring the latent variables in the variational autoencoder and estimating the unsupervised model parameters. Experiments show that the proposed method outperforms a semi-supervised NMF baseline and a state-ofthe-art fully supervised deep learning approach.
In this work we address the problem of approximating high-dimensional data with a lowdimensional representation. We make the following contributions. We propose an inverse regression method which exchanges the roles of input and response, such that the low-dimensional variable becomes the regressor, and which is tractable. We introduce a mixture of locally-linear probabilistic mapping model that starts with estimating the parameters of inverse regression, and follows with inferring closed-form solutions for the forward parameters of the high-dimensional regression problem of interest. Moreover, we introduce a partially-latent paradigm, such that the vector-valued response variable is composed of both observed and latent entries, thus being able to deal with data contaminated by experimental artifacts that cannot be explained with noise models. The proposed probabilistic formulation could be viewed as a latent-variable augmentation of regression. We devise expectation-maximization (EM) procedures based on a data augmentation strategy which facilitates the maximum-likelihood search over the model parameters. We propose two augmentation schemes and we describe in detail the associated EM inference procedures that may well be viewed as generalizations of a number of EM regression, dimension reduction, and factor analysis algorithms. The proposed framework is validated with both synthetic and real data. We provide experimental evidence that our method outperforms several existing regression techniques.
In this paper we propose a method to solve the stereo correspondence problem. The method matches features and feature relationships and can be paraphrased as follows. Linear edge segments are extracted from both the left and right images. Each such segment is characterized by its position and orientation in the image as well as its relationships with the nearby segments. A relational graph is thus built from each image. For each segment in one image a set of potential assignments in the other image is determined. These assignments are represented as nodes in a correspondence graph. Arcs in this graph represent compatible assignments established on the basis of segment relationships. Stereo matching becomes equivalent to searching for sets of mutually compatible nodes in this graph. These sets are found by looking for maximal cliques. The maximal clique the best suited to represent a stereo correspondence is selected using a benefit function. Finally we show numerous results obtained with this method.
International audienceThis paper describes a probabilistic generative model and its associated algorithm to jointly register multiple point sets. The vast majority of state-of-the-art registration techniques select one of the sets as the ''model" and perform pairwise alignments between the other sets and this set. The main drawback of this mode of operation is that there is no guarantee that the model-set is free of noise and outliers, which contaminates the estimation of the registration parameters. Unlike previous work, the proposed method treats all the point sets on an equal footing: they are realizations of a Gaussian mixture (GMM) and the registration is cast into a clustering problem. We formally derive an EM algorithm that estimates both the GMM parameters and the rotations and translations that map each individual set onto the ''central" model. The mixture means play the role of the registered set of points while the variances provide rich information about the quality of the registration. We thoroughly validate the proposed method with challenging datasets, we compare it with several state-of-the-art methods, and we show its potential for fusing real depth data
Time-of-flight (TOF) cameras are sensors that can measure the depths of scene-points, by illuminating the scene with a controlled laser or LED source, and then analyzing the reflected light. In this paper we will first describe the underlying measurement principles of time-of-flight cameras, including: (i) pulsedlight cameras, which measure directly the time taken for a light pulse to travel from the device to the object and back again, and (ii) continuous-wave modulatedlight cameras, which measure the phase difference between the emitted and received signals, and hence obtain the travel time indirectly. We review the main existing designs, including prototypes as well as commercially available devices. We also review the relevant camera calibration principles, and how they are applied to TOF devices. Finally, we discuss the benefits and challenges of combined TOF and color camera systems.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.