Actional-Structural Graph Convolutional Networks for Skeleton-Based Action Recognition

Li, Maosen; Chen, Siheng; Xu, Chen; Zhang, Ya; Wang, Yanfeng; Tian, Qi

doi:10.1109/cvpr.2019.00371

Cited by 852 publications

(537 citation statements)

References 20 publications

Supporting

Mentioning

536

Contrasting

Unclassified

Order By: Relevance

“…The results of the comparison are shown in Tables 5 and 6. The methods used for comparison include the handcraft-feature-based methods [33], RNN-based methods [28,29,34,35], CNN-based methods [36,37], and GCN-based methods [6][7][8][9][10]. From Table 5, we can see that our proposed method achieves the best performances of 96.8% and 91.7% in terms of two criteria on the NTU-RGBD dataset.…”

Section: Comparison With the State-of-the-artmentioning

confidence: 99%

“…ST-GCN (2018) [6] 81.5 88.3 AS-GCN(2018) [9] 86.8 94.2 PB-GCN (2018) [8] 87.5 93.2 2s-AGCN(2019) [7] 88.5 95.1 AGC-LSTM(2019) [10] 89.2 95.0 ours 91.7 96.8 Table 6. The results of different methods, which are designed for 3D human activity analysis, using the cross-subject and cross-setup evaluation criteria on the NTU RGB+D 120 dataset.…”

Section: Cross-subject (%) Cross-view (%)mentioning

confidence: 99%

“…Therefore, graph-based neural networks have been used for action recognition instead of the traditional CNN networks because of the successful performance. Some graph-based neural networks [6][7][8][9][10] are dedicated to learning both spatial and temporal features for action recognition. Meanwhile, they focus on capturing the hidden relationships among vertices in space.…”

Section: Introductmentioning

confidence: 99%

See 2 more Smart Citations

Action Recognition Based on the Fusion of Graph Convolutional Networks with High Order Features

et al. 2020

View full text Add to dashboard Cite

Skeleton-based action recognition is a widely used task in action related research because of its clear features and the invariance of human appearances and illumination. Furthermore, it can also effectively improve the robustness of the action recognition. Graph convolutional networks have been implemented on those skeletal data to recognize actions. Recent studies have shown that the graph convolutional neural network works well in the action recognition task using spatial and temporal features of skeleton data. The prevalent methods to extract the spatial and temporal features purely rely on a deep network to learn from primitive 3D position. In this paper, we propose a novel action recognition method applying high-order spatial and temporal features from skeleton data, such as velocity features, acceleration features, and relative distance between 3D joints. Meanwhile, a method of multi-stream feature fusion is adopted to fuse these high-order features we proposed. Extensive experiments on Two large and challenging datasets, NTU-RGBD and NTU-RGBD-120, indicate that our model achieves the state-of-the-art performance.

show abstract

Section: Comparison With the State-of-the-artmentioning

confidence: 99%

Section: Cross-subject (%) Cross-view (%)mentioning

confidence: 99%

Section: Introductmentioning

confidence: 99%

See 1 more Smart Citation

Action Recognition Based on the Fusion of Graph Convolutional Networks with High Order Features

et al. 2020

View full text Add to dashboard Cite

show abstract

“…Based on this judgment, Yan et al [ 23 ] proposed a spatial-temporal graph convolutional network (ST-GCN) representing human joints as vertices and the bones as edges. The ST-GCN improves the accuracy of action recognition to a new level, and substantial ST-GCNs are subsequently proposed based on it [ 24 , 25 , 26 , 27 , 28 , 29 , 30 , 31 , 32 , 33 , 34 , 35 ]. However, there are still two problems to be addressed in these methods.…”

Section: Introductionmentioning

confidence: 99%

Enhanced Spatial and Extended Temporal Graph Convolutional Network for Skeleton-Based Action Recognition

Zhu

et al. 2020

Sensors

View full text Add to dashboard Cite

In the skeleton-based human action recognition domain, the spatial-temporal graph convolution networks (ST-GCNs) have made great progress recently. However, they use only one fixed temporal convolution kernel, which is not enough to extract the temporal cues comprehensively. Moreover, simply connecting the spatial graph convolution layer (GCL) and the temporal GCL in series is not the optimal solution. To this end, we propose a novel enhanced spatial and extended temporal graph convolutional network (EE-GCN) in this paper. Three convolution kernels with different sizes are chosen to extract the discriminative temporal features from shorter to longer terms. The corresponding GCLs are then concatenated by a powerful yet efficient one-shot aggregation (OSA) + effective squeeze-excitation (eSE) structure. The OSA module aggregates the features from each layer once to the output, and the eSE module explores the interdependency between the channels of the output. Besides, we propose a new connection paradigm to enhance the spatial features, which expand the serial connection to a combination of serial and parallel connections by adding a spatial GCL in parallel with the temporal GCLs. The proposed method is evaluated on three large scale datasets, and the experimental results show that the performance of our method exceeds previous state-of-the-art methods.

show abstract

“…(1) Because ST-GCN [ 4 ] may not adequately capture the dependency between far-apart joints [ 5 ], it is unable to effectively extract the global co-occurrence features of actions. (2) Since

convolution cannot consider the relationship between each vertex and its surrounding vertices, these related works [ 4 , 6 , 7 ] may not effectively obtain the spatial features composed of adjacent vertices. (3) These works [ 4 , 6 , 7 ] expand the number of channels per-vertex as the number of network layers increases.…”

Section: Introductionmentioning

confidence: 99%

Global Co-Occurrence Feature and Local Spatial Feature Learning for Skeleton-Based Action Recognition

Xie

Xin

Liu

et al. 2020

Entropy

View full text Add to dashboard Cite

Recent progress on skeleton-based action recognition has been substantial, benefiting mostly from the explosive development of Graph Convolutional Networks (GCN). However, prevailing GCN-based methods may not effectively capture the global co-occurrence features among joints and the local spatial structure features composed of adjacent bones. They also ignore the effect of channels unrelated to action recognition on model performance. Accordingly, to address these issues, we propose a Global Co-occurrence feature and Local Spatial feature learning model (GCLS) consisting of two branches. The first branch, based on the Vertex Attention Mechanism branch (VAM-branch), captures the global co-occurrence feature of actions effectively; the second, based on the Cross-kernel Feature Fusion branch (CFF-branch), extracts local spatial structure features composed of adjacent bones and restrains the channels unrelated to action recognition. Extensive experiments on two large-scale datasets, NTU-RGB+D and Kinetics, demonstrate that GCLS achieves the best performance when compared to the mainstream approaches.

show abstract

Actional-Structural Graph Convolutional Networks for Skeleton-Based Action Recognition

Cited by 852 publications

References 20 publications

Action Recognition Based on the Fusion of Graph Convolutional Networks with High Order Features

Action Recognition Based on the Fusion of Graph Convolutional Networks with High Order Features

Enhanced Spatial and Extended Temporal Graph Convolutional Network for Skeleton-Based Action Recognition

Global Co-Occurrence Feature and Local Spatial Feature Learning for Skeleton-Based Action Recognition

Contact Info

Product

Resources

About