In this paper, we strive to answer two questions: What is the current state of 3D hand pose estimation from depth images? And, what are the next challenges that need to be tackled? Following the successful Hands In the Million Challenge (HIM2017), we investigate the top 10 state-ofthe-art methods on three tasks: single frame 3D pose estimation, 3D hand tracking, and hand pose estimation during object interaction. We analyze the performance of different CNN structures with regard to hand shape, joint visibility, view point and articulation distributions. Our findings include: (1) isolated 3D hand pose estimation achieves low mean errors (10 mm) in the view point range of [70, 120] degrees, but it is far from being solved for extreme view points; (2) 3D volumetric representations outperform 2D CNNs, better capturing the spatial structure of the depth data; (3) Discriminative methods still generalize poorly to unseen hand shapes; (4) While joint occlusions pose a challenge for most methods, explicit modeling of structure constraints can significantly narrow the gap between errors on visible and occluded joints.
Although skeleton-based action recognition has achieved great success in recent years, most of the existing methods may suffer from a large model size and slow execution speed. To alleviate this issue, we analyze skeleton sequence properties to propose a Double-feature Double-motion Network (DD-Net) for skeleton-based action recognition. By using a lightweight network structure (i.e., 0.15 million parameters), DD-Net can reach a super fast speed, as 3,500 FPS on one GPU, or, 2,000 FPS on one CPU. By employing robust features, DD-Net achieves the state-of-the-art performance on our experiment datasets: SHREC (i.e., hand actions) and JHMDB (i.e., body actions). Our code is on https://github.com/fandulu/DD-Net.
Graph convolutional neural networks (GCNNs) aim to extend the data representation and classification capabilities of convolutional neural networks, which are highly effective for signals defined on regular Euclidean domains, e.g. image and audio signals, to irregular, graph-structured data defined on non-Euclidean domains. Graph-theoretic tools that enable us to study the brain as a complex system are of great significance in brain connectivity studies. Particularly, in the context of Alzheimer’s disease (AD), a neurodegenerative disorder associated with network dysfunction, graph-based tools are vital for disease classification and staging. Here, we implement and test a multi-class GCNN classifier for network-based classification of subjects on the AD spectrum into four categories: cognitively normal, early mild cognitive impairment, late mild cognitive impairment, and AD. We train and validate the network using structural connectivity graphs obtained from diffusion tensor imaging data. Using receiver operating characteristic curves, we show that the GCNN classifier outperforms a support vector machine classifier by margins that are reliant on disease category. Our findings indicate that the performance gap between the two methods increases with disease progression from CN to AD. We thus demonstrate that GCNN is a competitive tool for staging and classification of subjects on the AD spectrum.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.