Quo Vadis, Skeleton Action Recognition?

Gupta, Pranay; Thatipelli, Anirudh; Aggarwal, Aditya; Maheshwari, Shubh; Trivedi, Neel; Das, Sourav; Sarvadevabhatla, Ravi Kiran

doi:10.1007/s11263-021-01470-y

Cited by 40 publications

(23 citation statements)

References 51 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…It requires researchers to employ other pose estimation methods (e.g., OpenPose [17], OpenPifPaf [18], MMPose [19], VIBE [20]) to extract and pre-process the skeletal representation such that the pre-processed skeletal data could be ready for training and evaluating the deep learning recognition models. Gupta et al [21] made an effort to organize a couple of skeletal datasets obtained from other public datasets [22,23] that were still collected from constrained environments and crowd-sourcing methods rather than real public spaces (In The Wild-ITW ), except that they do not provide the software to elaborate these data. We analyzed some relevant skeleton-based HAR models since 2018 to check how public datasets were used to train and evaluate models in the community.…”

Section: Modelmentioning

confidence: 99%

PyHAPT: A Python-based Human Activity Pose Tracking data processing framework

Quan

Bonarini

2022

Software Impacts

View full text Add to dashboard Cite

Section: Modelmentioning

confidence: 99%

PyHAPT: A Python-based Human Activity Pose Tracking data processing framework

Quan

Bonarini

2022

Software Impacts

View full text Add to dashboard Cite

“…Skeletics-152: Skeletics-152 [1] is a skeleton action dataset extracted from the Kinetics700 [35] dataset with the VIBE [36] pose estimator. Because Kinetics-700 has some activities without people and some that are to classify within the context of what humans interact with, 152 classes out of the 700 total classes are chosen to build Skeletics-152.…”

Section: A Datasetsmentioning

confidence: 99%

“…According to the utilized types of input data, action recognition methods are roughly categorized into image-based, skeleton-based, and hybrid approaches. In image-based approaches, optical flows, which refer to the point correspondences across pairs of images have been commonly used to represent the apparent motions of subjects of interest [1]. However, these methods often require time-consuming and storage-demanding subprocesses.…”

Section: Introductionmentioning

confidence: 99%

Rank-GCN for Robust Action Recognition

et al. 2022

View full text Add to dashboard Cite

We present a robust skeleton-based action recognition method with graph convolutional network (GCN) that uses the new adjacency matrix, called Rank-GCN. In Rank-GCN, the biggest change from previous approaches is how the adjacency matrix is generated to accumulate features from neighboring nodes by re-defining "adjacency." The new adjacency matrix, which we call the rank adjacency matrix, is generated by ranking all the nodes according to metrics including the Euclidean distance from the nodes of interest, whereas the previous GCNs methods used only 1-hop neighboring nodes to construct adjacency. By adopting the rank adjacency matrix, we find not only performance improvements but also robustness against swapping, location shifting and dropping of certain nodes. The fact that the human-made rank adjacency matrix wins against the deep-learning-based matrix, implies that there are still some parts that need touch of humans. We expect our Rank-GCN can make performance improvements especially when the predicted human joints are less accurate and unstable.

show abstract

“…Action recognition is fundamental in video-based tasks with many approaches proposed [20], [21], [22], [23], [24], [25], [26], [27], [28], [29] and datasets [30], [31], [18], [17], [19], [32], [33], [34]. We notice that there is also a trend for more fine-grained action understanding, from video classification [20], [21] to spatial-temporal action detection [32], [35], [36], [14], and human-part level action recognition [15].…”

Section: Related Work a Video Action Understandingmentioning

confidence: 99%

Skeleton-Based Mutually Assisted Interacted Object Localization and Human Action Recognition

Li¹,

Lan²,

Zeng³

et al. 2021

Preprint

View full text Add to dashboard Cite

Skeleton data carries valuable motion information and is widely explored in human action recognition. However, not only the motion information but also the interaction with the environment provides discriminative cues to recognize the action of persons. In this paper, we propose a joint learning framework for mutually assisted "interacted object localization" and "human action recognition" based on skeleton data. The two tasks are serialized together and collaborate to promote each other, where preliminary action type derived from skeleton alone helps improve interacted object localization, which in turn provides valuable cues for the final human action recognition. Besides, we explore the temporal consistency of interacted object as constraint to better localize the interacted object with the absence of ground-truth labels. Extensive experiments on the datasets of SYSU-3D, NTU60 RGB+D and Northwestern-UCLA show that our method achieves the best or competitive performance with the state-of-the-art methods for human action recognition. Visualization results show that our method can also provide reasonable interacted object localization results.

show abstract

Quo Vadis, Skeleton Action Recognition?

Cited by 40 publications

References 51 publications

PyHAPT: A Python-based Human Activity Pose Tracking data processing framework

PyHAPT: A Python-based Human Activity Pose Tracking data processing framework

Rank-GCN for Robust Action Recognition

Skeleton-Based Mutually Assisted Interacted Object Localization and Human Action Recognition

Contact Info

Product

Resources

About