2018
DOI: 10.48550/arxiv.1805.11790
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

A Fine-to-Coarse Convolutional Neural Network for 3D Human Action Recognition

Thao Minh Le,
Nakamasa Inoue,
Koichi Shinoda

Abstract: This paper presents a new framework for human action recognition from a 3D skeleton sequence. Previous studies do not fully utilize the temporal relationships between video segments in a human action. Some studies successfully used very deep Convolutional Neural Network (CNN) models but often suffer from the data insufficiency problem. In this study, we first segment a skeleton sequence into distinct temporal segments in order to exploit the correlations between them. The temporal and spatial features of a ske… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
3
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
4

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(5 citation statements)
references
References 27 publications
0
3
0
Order By: Relevance
“…The best performance was recorded by the CNN models. Although RNNs and HNNs employ recurrent layers that are specifically designed for processing sequential data, the results were not too surprising to our team for two key reasons: (1) the employed dataset is fairly small, consisting of less than 200 repetitions per exercises, and (2) a growing body of work report of improved performance by CNNs on time-series and movement data [43], [44]. More specifically, recurrent networks utilize a larger number of parameters, thus they are more prone to overfitting on smaller datasets.…”
Section: Discussionmentioning
confidence: 94%
“…The best performance was recorded by the CNN models. Although RNNs and HNNs employ recurrent layers that are specifically designed for processing sequential data, the results were not too surprising to our team for two key reasons: (1) the employed dataset is fairly small, consisting of less than 200 repetitions per exercises, and (2) a growing body of work report of improved performance by CNNs on time-series and movement data [43], [44]. More specifically, recurrent networks utilize a larger number of parameters, thus they are more prone to overfitting on smaller datasets.…”
Section: Discussionmentioning
confidence: 94%
“…There is also a large volume of work on two-person interaction classification from videos (e.g., [ 31 ]) and skeletal data (e.g., [ 32 , 33 , 34 , 35 , 36 , 37 , 38 , 39 ]). Some of these models incorporate temporal [ 31 , 37 ], spatial and temporal [ 34 ], or multilayer feature [ 35 ] attention mechanisms.…”
Section: Related Workmentioning
confidence: 99%
“…We compare our model with other methods in Table 3, including LRCN [27], 3D-ConvNet [28], Two-StreamI3D [23], Two-Stream-CBAM-I3D [24], and AFSD [29]. The metrics used are accuracy, precision, recall, and a weighted composite evaluation index of the three for the experimental evaluation of action recognition.…”
Section: Experiments Of Action Recognitionmentioning
confidence: 99%