In the task of human motion recognition, the overall action span is changeable, and there may be an inclusion relationship between action semantics. This paper proposes a novel multi-scale time sampling module and a deep spatiotemporal feature extraction module, which strengthens the receptive field of the feature map and strengthens the extraction of spatiotemporal-related feature information via the network. We study and compare the performance of three existing multi-channel fusion methods to improve the recognition accuracy of the network on the open skeleton recognition dataset. In this paper, several groups of comparative experiments are carried out on two public datasets. The experimental results show that compared with the classical 2s-AGCN algorithm, the accuracy of the algorithm proposed in this paper shows an improvement of 1% on the Kinetics dataset and 0.4% and 1% on the two evaluating indicators of the NTU-RGB+D dataset, respectively.
The Long Short-Term Memory (LSTM) network is a classic action recognition method because of its ability to extract time information. Researchers proposed many hybrid algorithms based on LSTM for human action recognition. In this paper, an improved Spatio–Temporal Differential Long Short-Term Memory (ST-D LSTM) network is proposed, an enhanced input differential feature module and a spatial memory state differential module are added to the network. Furthermore, a transmission mode of ST-D LSTM is proposed; this mode enables ST-D LSTM units to transmit the spatial memory state horizontally. Finally, these improvements are added into classical Long-term Recurrent Convolutional Networks (LRCN) to test the new network’s performance. Experimental results show that ST-D LSTM can effectively improve the accuracy of LRCN.
Human action recognition (HAR) is the foundation of human behavior comprehension. It is of great significance and can be used in many real-world applications. From the point of view of human kinematics, the coordination of limbs is an important intrinsic factor of motion and contains a great deal of information. In addition, for different movements, the HAR algorithm provides important, multifaceted attention to each joint. Based on the above analysis, this paper proposes a HAR algorithm, which adopts two attention modules that work together to extract the coordination characteristics in the process of motion, and strengthens the attention of the model to the more important joints in the process of moving. Experimental data shows these two modules can improve the recognition accuracy of the model on the public HAR dataset (NTU-RGB + D, Kinetics-Skeleton).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.