In this paper we introduce a large-scale hand pose dataset, collected using a novel capture method. Existing datasets are either generated synthetically or captured using depth sensors: synthetic datasets exhibit a certain level of appearance difference from real depth images, and real datasets are limited in quantity and coverage, mainly due to the difficulty to annotate them. We propose a tracking system with six 6D magnetic sensors and inverse kinematics to automatically obtain 21-joints hand pose annotations of depth maps captured with minimal restriction on the range of motion. The capture protocol aims to fully cover the natural hand pose space. As shown in embedding plots, the new dataset exhibits a significantly wider and denser range of hand poses compared to existing benchmarks. Current state-of-the-art methods are evaluated on the dataset, and we demonstrate significant improvements in cross-benchmark performance. We also show significant improvements in egocentric hand pose estimation with a CNN trained on the new dataset.
The performance of Neural Machine Translation (NMT) systems often suffers in lowresource scenarios where sufficiently largescale parallel corpora cannot be obtained. Pretrained word embeddings have proven to be invaluable for improving performance in natural language analysis tasks, which often suffer from paucity of data. However, their utility for NMT has not been extensively explored. In this work, we perform five sets of experiments that analyze when we can expect pre-trained word embeddings to help in NMT tasks. We show that such embeddings can be surprisingly effective in some cases -providing gains of up to 20 BLEU points in the most favorable setting. 1
Abstract. Discriminative methods often generate hand poses kinematically implausible, then generative methods are used to correct (or verify) these results in a hybrid method. Estimating 3D hand pose in a hierarchy, where the high-dimensional output space is decomposed into smaller ones, has been shown effective. Existing hierarchical methods mainly focus on the decomposition of the output space while the input space remains almost the same along the hierarchy. In this paper, a hybrid hand pose estimation method is proposed by applying the kinematic hierarchy strategy to the input space (as well as the output space) of the discriminative method by a spatial attention mechanism and to the optimization of the generative method by hierarchical Particle Swarm Optimization (PSO). The spatial attention mechanism integrates cascaded and hierarchical regression into a CNN framework by transforming both the input(and feature space) and the output space, which greatly reduces the viewpoint and articulation variations. Between the levels in the hierarchy, the hierarchical PSO forces the kinematic constraints to the results of the CNNs. The experimental results show that our method significantly outperforms four state-of-the-art methods and three baselines on three public benchmarks.
This article studies constructions of reproducing kernel Banach spaces (RKBSs) which may be viewed as a generalization of reproducing kernel Hilbert spaces (RKHSs). A key point is to endow Banach spaces with reproducing kernels such that machine learning in RKBSs can be well-posed and of easy implementation. First we verify many advanced properties of the general RKBSs such as density, continuity, separability, implicit representation, imbedding, compactness, representer theorem for learning methods, oracle inequality, and universal approximation. Then, we develop a new concept of generalized Mercer kernels to construct p-norm RKBSs for 1 ≤ p ≤ ∞. The p-norm RKBSs preserve the same simple format as the Mercer representation of RKHSs. Moreover, the p-norm RKBSs are isometrically equivalent to the standard p-norm spaces of countable sequences. Hence, the pnorm RKBSs possess more geometrical structures than RKHSs including sparsity. To be more precise, the suitable countable expansion terms of the generalized Mercer kernels can be used to represent the pairs of Schauder bases and biorthogonal systems of the p-norm RKBSs such that the generalized Mercer kernels become the reproducing kernels of the p-norm RKBSs. The generalized Mercer kernels also cover many well-known kernels, for example, min kernels, Gaussian kernels, and power series kernels. Finally, we propose to solve the support vector machines in the p-norm RKBSs, which are to minimize the regularized empirical risks over the p-norm RKBSs. We show that the infinite dimensional support vector machines in the p-norm RKBSs can be equivalently transferred to finite dimensional convex optimization problems such that we obtain the finite dimensional representations of the support vector machine solutions for practical applications. In particular, we verify that some special support vector machines in the 1-norm RKBSs are equivalent to the classical 1-norm sparse regressions. This gives fundamental supports of a novel learning tool called sparse learning methods to be investigated in our next research project.
Learning and predicting the pose parameters of a 3D hand model given an image, such as locations of hand joints, is challenging due to large viewpoint changes and articulations, and severe self-occlusions exhibited particularly in egocentric views. Both feature learning and prediction modeling have been investigated to tackle the problem. Though effective, most existing discriminative methods yield a single deterministic estimation of target poses. Due to their single-value mapping intrinsic, they fail to adequately handle self-occlusion problems, where occluded joints present multiple modes. In this paper, we tackle the self-occlusion issue and provide a complete description of observed poses given an input depth image by a novel method called hierarchical mixture density networks (HMDN). The proposed method leverages the state-of-the-art hand pose estimators based on Convolutional Neural Networks to facilitate feature learning, while it models the multiple modes in a two-level hierarchy to reconcile singlevalued and multi-valued mapping in its output. The whole framework with a mixture of two differentiable density functions is naturally end-to-end trainable. In the experiments, HMDN produces interpretable and diverse candidate samples, and significantly outperforms the state-of-the-art methods on two benchmarks with occlusions, and performs comparably on another benchmark free of occlusions.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.