Proceedings of the 25th ACM International Conference on Multimedia 2017
DOI: 10.1145/3123266.3123299
|View full text |Cite
|
Sign up to set email alerts
|

3D CNNs on Distance Matrices for Human Action Recognition

Abstract: In this paper we are interested in recognizing human actions from sequences of 3D skeleton data. For this purpose we combine a 3D Convolutional Neural Network with body representations based on Euclidean Distance Matrices (EDMs), which have been recently shown to be very e ective to capture the geometric structure of the human pose. One inherent limitation of the EDMs, however, is that they are de ned up to a permutation of the skeleton joints, i.e., randomly shu ing the ordering of the joints yields many di e… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
11
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
4
4
1

Relationship

0
9

Authors

Journals

citations
Cited by 29 publications
(11 citation statements)
references
References 32 publications
0
11
0
Order By: Relevance
“…Sequence-based treats the 3D-skeleton data as a multi-dimensional time-series and models it with a recurrent architecture [21,22,32,35,46] to learn the temporal dynamics of the joints. Image-based create a pseudo-image representation of the 3D-skeleton data [7,12,17,23,38] which is encoded by CNN architectures to model the co-occurrence of multiple joints and their motion. Finally, graph-based [4,13,18,24,31,33,37,44] represents the 3D-skeleton data with a graph consisting of spatial and temporal edges.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Sequence-based treats the 3D-skeleton data as a multi-dimensional time-series and models it with a recurrent architecture [21,22,32,35,46] to learn the temporal dynamics of the joints. Image-based create a pseudo-image representation of the 3D-skeleton data [7,12,17,23,38] which is encoded by CNN architectures to model the co-occurrence of multiple joints and their motion. Finally, graph-based [4,13,18,24,31,33,37,44] represents the 3D-skeleton data with a graph consisting of spatial and temporal edges.…”
Section: Related Workmentioning
confidence: 99%
“…robust to changes in background and appearance [23,46]. However, learning a good feature space for 3D actions requires large amounts of labeled skeleton data [7,12,35,36,[44][45][46], which is much harder to obtain than large amounts of labeled RGB video. To address this major shortcoming, we propose a new self-supervised contrastive learning method for 3D skeleton data.…”
Section: Introductionmentioning
confidence: 99%
“…RNN-based methods [6,8,9,10] aim to capture the temporal dependency of skeleton data and have achieved remarkable performance than manually designed features. CNNbased models [11,12] are also proposed to extract spatial and temporal information by applying convolution in both 3D skeletons and sequences. Recently GCN-based models [13] have been favored for the fine-grained modeling of the spatial structure by using graph representation, and have achieved more impressive performance.…”
Section: Related Workmentioning
confidence: 99%
“…The challenge with CNN based methods is the extraction and utilization of spatial as well as temporal information from 3D skeleton sequences. Several other problems hinder these techniques including model size and speed [45], occlusions, CNN architecture definition [30], and viewpoint variations [47]. Skeleton based action recognition using CNNs thus remains a not completely solved research question.…”
Section: Action Recognitionmentioning
confidence: 99%