A Novel 3D Human Action Recognition Framework for Video Content Analysis

Wei, Lianglei; Wu, Yirui; Wang, Wenhai; Lü, Tong

doi:10.1007/978-3-319-73603-7_4

Cited by 5 publications

(2 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Another category, ie, deep neural network methods, learns spatiotemporal characteristics by automatically extracting distinctive features from large data for accurate recognition . Among the different neural‐based architectures, recurrent neural networks (RNNs), which are specially designed to handle sequential data with variable length, have achieved promising performances in 3D action recognition . For example, Liu et al proposed a long short‐term memory (LSTM) network incorporating a tree structure to describe the relation of human parts, which successfully utilizes the spatiotemporal characteristics of human actions for the recognition task and achieves desirable accuracy on a large data set, ie, NTU RGB+D .…”

Section: Introductionmentioning

confidence: 99%

“…18,19 Among the different neural-based architectures, recurrent neural networks (RNNs), which are specially designed to handle sequential data with variable length, have achieved promising performances in 3D action recognition. 20,21 For example, Liu et al 13 proposed a long short-term memory (LSTM) network incorporating a tree structure to describe the relation of human parts, which successfully utilizes the spatiotemporal characteristics of human actions for the recognition task and achieves desirable accuracy on a large data set, ie, NTU RGB+D. 22 Following on the thought of the modeling relationship of two concurrent domains, ie, spatial and temporal, Hu et al 23 proposed a deep bilinear framework to further describe such relationship, where their proposed modality pooling layer and temporal pooling layer could pool the input action sequence along the modality and temporal directions separately.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Deep spatiotemporal LSTM network with temporal pattern feature for 3D human action recognition

Wei²,

Duan

2019

Computational Intelligence

Self Cite

View full text Add to dashboard Cite

With the rapid development of RGB‐D cameras and pose estimation techniques, action recognition based on three‐dimensional skeleton data has gained significant attention in the artificial intelligence community. In this paper, we incorporate temporal pattern descriptors of joint positions with the currently popular long short‐term memory (LSTM)–based learning scheme to obtain accurate and robust action recognition. Considering that actions are essentially formed by small subactions, we first utilize a two‐dimensional wavelet transform to extract temporal pattern descriptors in the frequency domain for each subaction. Afterward, we design a novel LSTM structure to extract deep features, which model a long‐term spatiotemporal correlation between body parts. Since temporal pattern descriptors and LSTM deep features can be regarded as multimodal representations for actions, we fuse them with an autoencoder network to achieve a more effective feature descriptor for action recognition. Experimental results on three challenging data sets with several comparative methods demonstrate the effectiveness of the proposed method for three‐dimensional action recognition.

show abstract

Section: Introductionmentioning

confidence: 99%