Human action recognition plays a key role in human-computer interaction in complex environments. However, similar actions will lead to poor feature sequence extraction and result in a reduction in recognition accuracy. This paper proposes a method (Action-Fusion: Multi-label subspace Learning (MLSL)) from depth maps called Depth Sequential Information Entropy Maps (DSIEM) and skeleton data for human action recognition in multiple modal features. The DSIEM describe the spatial information of human motion with information entropy, and describe the temporal information through stitching. DSIEM can reduce the redundancy of depth sequences and effectively capture spatial motion states. MLSL studies the relationship between different modalities and the inherent connection between different labels. The method is evaluated on three public datasets: Microsoft action 3D dataset (MSR Action3D), University of Texas at Dallas-multimodal human action dataset (UTD-MHAD), UTD MHAD-Kinect Version-2 (UTD-MHAD-Kinect V2). Experimental results show that the proposed MLSL model obtains new state-of-the-art results, including achieving the average rate of the MSR Action3D to 93.55%, the average rate of the UTD-MHAD to 88.37% and the average rate of the UTD-MHAD-Kinect V2 to 90.66%.
In contemporary research on human action recognition, most methods separately consider the movement features of each joint. However, they ignore that human action is a result of integrally cooperative movement of each joint. Regarding the problem, this paper proposes an action feature representation, called Motion Collaborative Spatio-Temporal Vector (MCSTV) and Motion Spatio-Temporal Map (MSTM). MCSTV comprehensively considers the integral and cooperative between the motion joints. MCSTV weighted accumulates limbs’ motion vector to form a new vector to account for the movement features of human action. To describe the action more comprehensively and accurately, we extract key motion energy by key information extraction based on inter-frame energy fluctuation, project the energy to three orthogonal axes and stitch them in temporal series to construct the MSTM. To combine the advantages of MSTM and MCSTV, we propose Multi-Target Subspace Learning (MTSL). MTSL projects MSTM and MCSTV into a common subspace and makes them complement each other. The results on MSR-Action3D and UTD-MHAD show that our method has higher recognition accuracy than most existing human action recognition algorithms.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.