Human body skeleton, acting as a spatiotemporal graph, is increasing attentions of researchers to adopt graph convolutional networks (GCN) to mine the discriminative features from skeleton joints. However, one of GCN's flaws is its inability to handle long-distance reliance between joints. In this regard, graph attention network (GAT) was recently suggested, which combines graph convolutions with a self-attention mechanism to extract the most informative joint of a human skeleton and increase the model accuracy. However, GAT can compute only static attention: for each query node, the attention rank is same which severely hurts and limits the expressivity of an attention mechanism. In this work, we present a spatial-temporal dynamic graph attention network (ST-DGAT) to learn the spatial-temporal patterns of skeleton sequences. For dynamic graph attention, we tweak the order of weighted vector operations in GAT, our approach achieves a global approximate attention function, making it strictly superior to GAT. Experiments show that by fixing the order of internal operation of GAT the proposed model achieved better action classification results while maintaining the same computing cost as GAT. The proposed framework has been evaluated on well-known publicly available large-scale datasets NTU60, NTU120, and Kinetics-400, which notably outperforms state-of-the-art (SOTA) results with an accuracy of 96.4%, 88.2%, and 61.0%, respectively.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.