Temporal Attention-Augmented Graph Convolutional Network for Efficient Skeleton-Based Human Action Recognition

Heidari, Negar; Iosifidis, Alexandros

doi:10.1109/icpr48806.2021.9412091

Cited by 21 publications

(19 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…( 2) and as bilinear mapping of the form in Eq. ( 4), or if it is adequate the employ linear mappings effectively leading to neural layers combining Method CS(%) CV(%) #Streams HBRNN [8] 59.1 64.0 5 Deep LSTM [10] 60.7 67.3 1 ST-LSTM [9] 69.2 77.7 1 STA-LSTM [11] 73.4 81.2 1 VA-LSTM [12] 79.2 87.7 1 ARRN-LSTM [13] 80.7 88.8 2 2s-3DCNN [14] 66.8 72.6 2 TCN [15] 74.3 83.1 1 Clips+CNN+MTLN [16] 79.6 84.8 1 Synthesized CNN [17] 80.0 87.2 1 3scale ResNet152 [18] 85.0 92.3 1 CNN+Motion+Trans [19] 83.2 89.3 2 ST-GCN [20] 81.5 88.3 1 DPRL+GCNN [25] 83.5 89.8 1 TA-GCN [26] 87.97 94.2 1 AS-GCN [24] 86.8 94.2 2 2s-AGCN [22] 88.5 95.1 2 2s-TA-GCN [26] 88.5 95.1 2 GCN-NAS [23] 89 a Multilayer Perceptron block with a Temporal Convolution block, we conducted a second set of experiments. In this set of experiments, we used a 10-layer spatio-temporal bilinear network formed by the same data transformation sizes as in the first set of our experiments, but instead of using bilinear mappings with V (l) = V for all 10 layers of the model, we use V (l) = V, l = 1, .…”

Section: Methodsmentioning

confidence: 99%

“…AS-GCN [24] extended the skeleton graphs to represent both structural links and actional linked and proposed an actional-structural graph convolutional network, which has an encoder-decoder structure, to capture richer dependencies from actions. DPRL+GCNN [25] and TA-GCN [26] select the most informative skeletons in a sequence to make the inference process more efficient.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

On the spatial attention in Spatio-Temporal Graph Convolutional Networks for skeleton-based human action recognition

Heidari¹,

Iosifidis²

2020

Preprint

Self Cite

View full text Add to dashboard Cite

Graph convolutional networks (GCNs) achieved promising performance in skeleton-based human action recognition by modeling a sequence of skeletons as a spatio-temporal graph. Most of the recently proposed GCN-based methods improve the performance by learning the graph structure at each layer of the network using a spatial attention applied on a predefined graph Adjacency matrix that is optimized jointly with model's parameters in an end-to-end manner. In this paper, we analyze the spatial attention used in spatio-temporal GCN layers and propose a symmetric spatial attention for better reflecting the symmetric property of the relative positions of the human body joints when executing actions. We also highlight the connection of spatio-temporal GCN layers employing additive spatial attention to bilinear layers, and we propose the spatio-temporal bilinear network (ST-BLN) which does not require the use of predefined Adjacency matrices and allows for more flexible design of the model. Experimental results show that the three models lead to effectively the same performance. Moreover, by exploiting the flexibility provided by the proposed ST-BLN, one can increase the efficiency of the model.

show abstract

Section: Methodsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

On the spatial attention in Spatio-Temporal Graph Convolutional Networks for skeleton-based human action recognition

Heidari¹,

Iosifidis²

2020

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…GCN-based models for skeleton-based action recognition [15,16,18,22,23,27,28] operate on sequences of skeleton graphs. The spatio-temporal graph of skeletons G = (V, E) has the human body joint coordinates as nodes V and the spatial and temporal connections between them as edges E. Figure 2 (right) illustrates such a spatio-temporal graph where the spatial graph edges encode the human bones and the temporal edges connect the same joints in subsequent time-steps.…”

Section: A Spatio-temporal Graph Convolutional Networkmentioning

confidence: 99%

“…Unfortunately, the high computational complexity of these GCN-based methods makes them infeasible in real-time applications and resource-constrained online inference settings. Multiple approaches have been explored to increase the efficiency of skeleton-based action recognition recently: GCN-NAS [22] and PST-GCN [23] are neural architecture search based methods which try to find an optimized ST-GCN architecture to increase the efficiency of the classification task; ShiftGCN [24] replaces graph and temporal convolutions with a zero-FLOPs shift graph operation and pointwise convolutions as an efficient alternative to the featurepropagation rule for GCNs [25]; ShiftGCN++ [26] boost the efficiency of ShiftGCN further via progressive architecture search, knowledge-distillation, explicit spatial positional encodings, and a Dynamic Shift Graph Convolution; SGN [27] utilizes semantic information such as joint type and frame index as side information to design a compact semanticsguided neural network (SGN) for capturing both spatial and temporal correlations in joint and frame level; TA-GCN [28] tries to make inference more efficient by selecting a subset of key skeletons, which hold the most important features for action recognition, from a sequence to be processed by the spatio-temporal convolutions.…”

Section: Introductionmentioning

confidence: 99%

Online Skeleton-based Action Recognition with Continual Spatio-Temporal Graph Convolutional Networks

Hedegaard¹,

Heidari²,

Iosifidis³

2022

Preprint

Self Cite

View full text Add to dashboard Cite

Graph-based reasoning over skeleton data has emerged as a promising approach for human action recognition. However, the application of prior graph-based methods, which predominantly employ whole temporal sequences as their input, to the setting of online inference entails considerable computational redundancy. In this paper, we tackle this issue by reformulating the Spatio-Temporal Graph Convolutional Neural Network as a Continual Inference Network, which can perform step-by-step predictions in time without repeat frame processing. To evaluate our method, we create a continual version of ST-GCN, CoST-GCN, alongside two derived methods with different self-attention mechanisms, CoAGCN and CoS-TR. We investigate weight transfer strategies and architectural modifications for inference acceleration, and perform experiments on the NTU RGB+D 60, NTU RGB+D 120, and Kinetics Skeleton 400 datasets. Retaining similar predictive accuracy, we observe up to 109× reduction in time complexity, on-hardware accelerations of 26×, and reductions in maximum allocated memory of 52% during online inference.

show abstract

“…More directly related to the problem of human activity recognition, state-of-the-art results are based on Graph Neural Networks (GCNs), which treat the human skeleton joints as graph nodes, and their connections (bones) as graph edges. For instance, the authors in [13] proposed a module to select the most informative frames in a skeleton sequence and fuse it with a GCN module; the authors in [14] presented a GCN architecture that fuses information both from nodes and skeleton edges; and the authors in [15] introduced the spatial temporal GCN, which applies graph convolutions on the spatial domain and regular convolutions on the temporal domain.…”

Section: Introductionmentioning

confidence: 99%

Koopman pose predictions for temporally consistent human walking estimations

Mitjans¹,

Levine²,

Awad³

et al. 2022

Preprint

View full text Add to dashboard Cite

We tackle the problem of tracking the human lower body as an initial step toward an automatic motion assessment system for clinical mobility evaluation, using a multimodal system that combines Inertial Measurement Unit (IMU) data, RGB images, and point cloud depth measurements. This system applies the factor graph representation to an optimization problem that provides 3-D skeleton joint estimations. In this paper, we focus on improving the temporal consistency of the estimated human trajectories to greatly extend the range of operability of the depth sensor. More specifically, we introduce a new factor graph factor based on Koopman theory that embeds the nonlinear dynamics of several lower-limb movement activities. This factor performs a two-step process: first, a custom activity recognition module based on spatial temporal graph convolutional networks recognizes the walking activity; then, a Koopman pose prediction of the subsequent skeleton is used as an a priori estimation to drive the optimization problem toward more consistent results. We tested the performance of this module on datasets composed of multiple clinical lowerlimb mobility tests, and we show that our approach reduces outliers on the skeleton form by almost 1 m, while preserving natural walking trajectories at depths up to more than 10 m.

show abstract

Temporal Attention-Augmented Graph Convolutional Network for Efficient Skeleton-Based Human Action Recognition

Cited by 21 publications

References 23 publications

On the spatial attention in Spatio-Temporal Graph Convolutional Networks for skeleton-based human action recognition

On the spatial attention in Spatio-Temporal Graph Convolutional Networks for skeleton-based human action recognition

Online Skeleton-based Action Recognition with Continual Spatio-Temporal Graph Convolutional Networks

Koopman pose predictions for temporally consistent human walking estimations

Contact Info

Product

Resources

About