2021
DOI: 10.1007/s11263-021-01470-y
|View full text |Cite
|
Sign up to set email alerts
|

Quo Vadis, Skeleton Action Recognition?

Abstract: In this paper, we study current and upcoming frontiers across the landscape of skeleton-based human action recognition. To begin with, we benchmark state-of-the-art models on the NTU-120 dataset and provide multi-layered assessment of the results. To examine skeleton action recognition 'in the wild', we introduce Skeletics-152, a curated and 3-D pose-annotated subset of RGB videos sourced from Kinetics-700, a large-scale action dataset. The results from benchmarking the top performers of NTU-120 on Skeletics-1… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
20
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 40 publications
(23 citation statements)
references
References 51 publications
1
20
0
Order By: Relevance
“…It requires researchers to employ other pose estimation methods (e.g., OpenPose [17], OpenPifPaf [18], MMPose [19], VIBE [20]) to extract and pre-process the skeletal representation such that the pre-processed skeletal data could be ready for training and evaluating the deep learning recognition models. Gupta et al [21] made an effort to organize a couple of skeletal datasets obtained from other public datasets [22,23] that were still collected from constrained environments and crowd-sourcing methods rather than real public spaces (In The Wild-ITW ), except that they do not provide the software to elaborate these data. We analyzed some relevant skeleton-based HAR models since 2018 to check how public datasets were used to train and evaluate models in the community.…”
Section: Modelmentioning
confidence: 99%
“…It requires researchers to employ other pose estimation methods (e.g., OpenPose [17], OpenPifPaf [18], MMPose [19], VIBE [20]) to extract and pre-process the skeletal representation such that the pre-processed skeletal data could be ready for training and evaluating the deep learning recognition models. Gupta et al [21] made an effort to organize a couple of skeletal datasets obtained from other public datasets [22,23] that were still collected from constrained environments and crowd-sourcing methods rather than real public spaces (In The Wild-ITW ), except that they do not provide the software to elaborate these data. We analyzed some relevant skeleton-based HAR models since 2018 to check how public datasets were used to train and evaluate models in the community.…”
Section: Modelmentioning
confidence: 99%
“…Skeletics-152: Skeletics-152 [1] is a skeleton action dataset extracted from the Kinetics700 [35] dataset with the VIBE [36] pose estimator. Because Kinetics-700 has some activities without people and some that are to classify within the context of what humans interact with, 152 classes out of the 700 total classes are chosen to build Skeletics-152.…”
Section: A Datasetsmentioning
confidence: 99%
“…According to the utilized types of input data, action recognition methods are roughly categorized into image-based, skeleton-based, and hybrid approaches. In image-based approaches, optical flows, which refer to the point correspondences across pairs of images have been commonly used to represent the apparent motions of subjects of interest [1]. However, these methods often require time-consuming and storage-demanding subprocesses.…”
Section: Introductionmentioning
confidence: 99%
“…Action recognition is fundamental in video-based tasks with many approaches proposed [20], [21], [22], [23], [24], [25], [26], [27], [28], [29] and datasets [30], [31], [18], [17], [19], [32], [33], [34]. We notice that there is also a trend for more fine-grained action understanding, from video classification [20], [21] to spatial-temporal action detection [32], [35], [36], [14], and human-part level action recognition [15].…”
Section: Related Work a Video Action Understandingmentioning
confidence: 99%