Learning multi-temporal-scale deep information for action recognition

Yao, Guangle; Leí, Tao; Zhong, Jiandan; Jiang, Ping

doi:10.1007/s10489-018-1347-3

Cited by 24 publications

(11 citation statements)

References 33 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…On this dataset, the proposed method (Inception-Resnet-v2 plus WVLGTP) greatly outperforms the VLBP [1], MBP [2], ML-HDP [25], two-stream CNN [10], and multi-resolution CNN [12] by 16.5%, 13.7%, 5.6%, 6.9%, and 30.4%, respectively. In contrast, our approach slightly beats the TDD [31], TC3D [32], Res3D [33], ActionVLAD [35], and Sequential VLAD [36] since these approaches also achieved more discriminative power by considering the deep features and motion feature with CNN. Furthermore, ATW CNN [34] shows almost similar accuracy with our approach, since their approach incorporates the temporal attention with CNN.…”

Section: Methodsmentioning

confidence: 93%

“…Similarly, WVLGTP shows competitive performance with Dense Trajectories [27], iDT [9], and Line Pooling [37]. However, TDD [31], Res3D [33], Action VLAD [35], and Sequential VLAD [36] show better accuracy than the proposed WVLGTP due to their discriminative power while employing a large number of action categories. On this dataset, the proposed method (Inception-Resnet-v2 plus WVLGTP) greatly outperforms the VLBP [1], MBP [2], ML-HDP [25], two-stream CNN [10], and multi-resolution CNN [12] by 16.5%, 13.7%, 5.6%, 6.9%, and 30.4%, respectively.…”

Section: Methodsmentioning

confidence: 99%

“…Similar to [31], Lu et al [32] also proposed trajectory pooling approach along with 3D ConvNets for action recognition, in which they computed multiscale dense trajectories and on 3D ConvNets they produced trajectory pooling. In [33], the authors extracted the features in multiple temporal scales and employed Res3D neural network model. Furthermore, they acquired information from RGB channels and optical flow.…”

Section: Related Workmentioning

confidence: 99%

See 2 more Smart Citations

Feature Fusion of Deep Spatial Features and Handcrafted Spatiotemporal Features for Human Action Recognition

Uddin

Lee

2019

Sensors

View full text Add to dashboard Cite

Human action recognition plays a significant part in the research community due to its emerging applications. A variety of approaches have been proposed to resolve this problem, however, several issues still need to be addressed. In action recognition, effectively extracting and aggregating the spatial-temporal information plays a vital role to describe a video. In this research, we propose a novel approach to recognize human actions by considering both deep spatial features and handcrafted spatiotemporal features. Firstly, we extract the deep spatial features by employing a state-of-the-art deep convolutional network, namely Inception-Resnet-v2. Secondly, we introduce a novel handcrafted feature descriptor, namely Weber’s law based Volume Local Gradient Ternary Pattern (WVLGTP), which brings out the spatiotemporal features. It also considers the shape information by using gradient operation. Furthermore, Weber’s law based threshold value and the ternary pattern based on an adaptive local threshold is presented to effectively handle the noisy center pixel value. Besides, a multi-resolution approach for WVLGTP based on an averaging scheme is also presented. Afterward, both these extracted features are concatenated and feed to the Support Vector Machine to perform the classification. Lastly, the extensive experimental analysis shows that our proposed method outperforms state-of-the-art approaches in terms of accuracy.

show abstract

Section: Methodsmentioning

confidence: 93%

Section: Methodsmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Feature Fusion of Deep Spatial Features and Handcrafted Spatiotemporal Features for Human Action Recognition

Uddin

Lee

2019

Sensors

View full text Add to dashboard Cite

show abstract

“…Using 3D convolutional network to carry out the above operations will lead to a significant increase in time and space complexity. In order to effectively solve the above problems, the convolutional neural network is comprehensively improved [7,8]. After the improvement, multiple image features can be input at the same time, and the convolutional kernel is two-dimensional, the overall complexity of the algorithm is also effectively reduced, and the computational efficiency is greatly improved [9].…”

Section: Human Movement Recognitionmentioning

confidence: 99%

Research on Human Motion Recognition Based on Data Redundancy Technology

2021

View full text Add to dashboard Cite

Aiming at the problems of low recognition rate and slow recognition speed of traditional body action recognition methods, a human action recognition method based on data deduplication technology is proposed. Firstly, the data redundancy technology and perceptual hashing technology are combined to form an index, and the image is filtered from the structure, color, and texture features of human action image to achieve image redundancy processing. Then, the depth feature of processed image is extracted by depth motion map; finally, feature recognition is carried out by convolution neural network so as to achieve the purpose of human action recognition. The simulation results show that the proposed method can obtain the optimal recognition results and has strong robustness. At the same time, it also fully proves the importance of human motion recognition.

show abstract

“…Every atomic action was denoted as a composite latent state consisted by a latent semantic attribute and a latent geometric attribute. In their work, hidden markov model (HMM) with AdaBoost, dynamic temporal warping, and recur- Yao et al [17] studied parallel pair discriminant correlation analysis (PPDCA) to fuse the multi-temporal-scale information with a lower dimension. However, the multitemporal-scale in this method means features related to different numbers of frames.…”

Section: Related Workmentioning

confidence: 99%

Human Action Recognition Based on Multi-scale Feature Maps from Depth Video Sequences

Huang

et al. 2021

Preprint

View full text Add to dashboard Cite

Human action recognition is an active research area in computer vision. Although great process has been made, previous methods mostly recognize actions based on depth data at only one scale, and thus they often neglect multi-scale features that provide additional information action recognition in practical application scenarios. In this paper, we present a novel framework focusing on multiscale motion information to recognize human actions from depth video sequences. We propose a multi-scale feature map called Laplacian pyramid depth motion images(LP-DMI). We employ depth motion images (DMI) as the templates to generate the multi-scale static representation of actions. Then, we caculate LP-DMI to enhance multi-scale dynamic information of motions and reduces redundant static information in human bodies. We further extract the multi-granularity descriptor called LP-DMI-HOG to provide more discriminative features. Finally, we utilize extreme learning machine (ELM) for action classification. The proposed method yeilds the recognition accuracy of 93.41%, 85.12%, 91.94% on public MSRAc-tion3D dataset, UTD-MHAD and DHA dataset. Through extensive experiments, we prove that our method outperforms state-of-the-art benchmarks.

show abstract

Learning multi-temporal-scale deep information for action recognition

Cited by 24 publications

References 33 publications

Feature Fusion of Deep Spatial Features and Handcrafted Spatiotemporal Features for Human Action Recognition

Feature Fusion of Deep Spatial Features and Handcrafted Spatiotemporal Features for Human Action Recognition

Research on Human Motion Recognition Based on Data Redundancy Technology

Human Action Recognition Based on Multi-scale Feature Maps from Depth Video Sequences

Contact Info

Product

Resources

About