2017
DOI: 10.1109/lsp.2017.2690339
|View full text |Cite
|
Sign up to set email alerts
|

SkeletonNet: Mining Deep Part Features for 3-D Action Recognition

Abstract: This letter presents SkeletonNet, a deep learning framework for skeleton-based 3D action recognition. Given a skeleton sequence, the spatial structure of the skeleton joints in each frame and the temporal information between multiple frames are two important factors for action recognition. We firstly extract body-part based features from each frame of the skeleton sequence. Compared to the original coordinates of the skeleton joints, the proposed features are translation, rotation and scale invariant. To learn… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
74
0

Year Published

2018
2018
2020
2020

Publication Types

Select...
5
1
1

Relationship

1
6

Authors

Journals

citations
Cited by 149 publications
(75 citation statements)
references
References 30 publications
1
74
0
Order By: Relevance
“…CS CV SkeletonNet(CNN) [63] 75.94% 81.16% JDM+CNN [64] 76.20% 82.30% Clips+CNN+MTLN [65] 79.57% 84.83% Enhanced Visualization+CNN [66] 80.03% 87.21% HCN [67] 86.5% 91.1% TCN + TTN [68] 77.55% 84.25% STGCN [69] 81.5% 88.3% PB-GCN [70] 87.5% 93.2% 1 Layer PLSTM [29] 62.05% 69.40% 2 Layer PLSTM [29] 62.93% 70.27% JL d+RNN [71] 70.26% 82.39% STA-LSTM [72] 73.40% 81.20% Pose conditioned STA-LSTM [73] 77.10% 84.50% 1 Layer RNN (reported in [29] are present in the scene, the skeleton identity captured by the Kinect sensor may be changed over time. Therefore, an alignment process was first applied to keep the same skeleton saved in the same data array over time.…”
Section: Methodsmentioning
confidence: 99%
“…CS CV SkeletonNet(CNN) [63] 75.94% 81.16% JDM+CNN [64] 76.20% 82.30% Clips+CNN+MTLN [65] 79.57% 84.83% Enhanced Visualization+CNN [66] 80.03% 87.21% HCN [67] 86.5% 91.1% TCN + TTN [68] 77.55% 84.25% STGCN [69] 81.5% 88.3% PB-GCN [70] 87.5% 93.2% 1 Layer PLSTM [29] 62.05% 69.40% 2 Layer PLSTM [29] 62.93% 70.27% JL d+RNN [71] 70.26% 82.39% STA-LSTM [72] 73.40% 81.20% Pose conditioned STA-LSTM [73] 77.10% 84.50% 1 Layer RNN (reported in [29] are present in the scene, the skeleton identity captured by the Kinect sensor may be changed over time. Therefore, an alignment process was first applied to keep the same skeleton saved in the same data array over time.…”
Section: Methodsmentioning
confidence: 99%
“…Joint-based methods model the positions and motion of the joints (either individual or a combination) using the coordinates of the joints extracted by the OpenNI tracking framework. For instance, a reference joint may be used and the coordinates of other joints are defined relative to the reference joint [10,17,20,21], or the joint orientations may be computed relative to a fixed coordinate system and used to represent the human pose [66], etc. For the body part based methods, the human body parts are used to model the human's articulated system.…”
Section: Related Workmentioning
confidence: 99%
“…Thus, a human skeleton is a point of the Lie group SE(3) × · · · × SE(3) where each action corresponds to a unique evolution of such a point in time. The approach of Ke et al [20] relies on both body parts and body joints. The human skeleton model was divided into 5 body parts.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations