Genetic Programming-Evolved Spatio-Temporal Descriptor for Human Action Recognition

Liu, Li; Shao, Ling; Rockett, Peter

doi:10.5244/c.26.18

Cited by 16 publications

(13 citation statements)

References 34 publications

(21 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The first group combines hand-crafted skeleton features and graphical models to recognize actions. The spatio-temporal representations from skeleton sequences are often modeled by several common probabilistic graphical models such as Hidden Markov Model (HMM) (Lv and Nevatia, 2006;Wang et al, 2012;Yang et al, 2013), Latent Dirichlet Allocation (LDA) (Blei et al, 2003;Liu et al, 2012) or Conditional Random Field (CRF) (Koppula and Saxena, 2013). In addition, Fourier Temporal Pyramid (FTP) (Wang et al, 2012;Vemulapalli et al, 2014;Hu et al, 2015) has also been used to capture the temporal dynamics of actions and then to predict their labels.…”

Section: Related Workmentioning

confidence: 99%

Exploiting deep residual networks for human action recognition from skeletal data

Pham

Khoudour

Crouzil

et al. 2018

Computer Vision and Image Understanding

View full text Add to dashboard Cite

The computer vision community is currently focusing on solving action recognition problems in real videos, which contain thousands of samples with many challenges. In this process, Deep Convolutional Neural Networks (D-CNNs) have played a significant role in advancing the state-of-the-art in various vision-based action recognition systems. Recently, the introduction of residual connections in conjunction with a more traditional CNN model in a single architecture called Residual Network (ResNet) has shown impressive performance and great potential for image recognition tasks. In this paper, we investigate and apply deep ResNets for human action recognition using skeletal data provided by depth sensors. Firstly, the 3D coordinates of the human body joints carried in skeleton sequences are transformed into image-based representations and stored as RGB images. These color images are able to capture the spatial-temporal evolutions of 3D motions from skeleton sequences and can be efficiently learned by D-CNNs. We then propose a novel deep learning architecture based on ResNets to learn features from obtained color-based representations and classify them into action classes. The proposed method is evaluated on three challenging benchmark datasets including MSR Action 3D, KARD, and NTU-RGB+D datasets. Experimental results demonstrate that our method achieves state-of-the-art performance for all these benchmarks whilst requiring less computation resource. In particular, the proposed method surpasses previous approaches by a significant margin of 3.4% on MSR Action 3D dataset, 0.67% on KARD dataset, and 2.5% on NTU-RGB+D dataset. (Huy-Hieu Pham) challenging task due to many obstacles such as viewpoint, occlusion or lighting conditions (Poppe, 2010).Traditional studies on HAR mainly focus on the use of handcrafted local features such as Cuboids (Dollár et al., 2005) or HOG/HOF (Laptev et al., 2008) that are provided by 2D cameras. These approaches typically recognize human actions based on the appearance and movements of human body parts in videos. Another approach is to use Genetic Programming (GP) for generating spatio-temporal descriptors of motions . However, one of the major limitations of the 2D data is the absence of 3D structure from the scene. There-arXiv:1803.07781v1 [cs.CV]

show abstract

Section: Related Workmentioning

confidence: 99%

Exploiting deep residual networks for human action recognition from skeletal data

Pham

Khoudour

Crouzil

et al. 2018

Computer Vision and Image Understanding

View full text Add to dashboard Cite

show abstract

“…In [ 16 ], an aggregation function (average value) is tested, which, in fact, outperforms other approaches based on decision-level fusion. Conversely, in [ 25 ], single-view features are successfully joined using a concatenation of vectors and, therefore, preserving all the characteristic data. More sophisticated techniques can also be found, as in [ 26 ], where canonical correlation analysis is employed.…”

Section: Related Workmentioning

confidence: 99%

A Vision-Based System for Intelligent Monitoring: Human Behaviour Analysis and Privacy by Context

Chaaraoui

Padilla-López

Ferrández-Pastor

et al. 2014

Sensors

View full text Add to dashboard Cite

Due to progress and demographic change, society is facing a crucial challenge related to increased life expectancy and a higher number of people in situations of dependency. As a consequence, there exists a significant demand for support systems for personal autonomy. This article outlines the vision@home project, whose goal is to extend independent living at home for elderly and impaired people, providing care and safety services by means of vision-based monitoring. Different kinds of ambient-assisted living services are supported, from the detection of home accidents, to telecare services. In this contribution, the specification of the system is presented, and novel contributions are made regarding human behaviour analysis and privacy protection. By means of a multi-view setup of cameras, people's behaviour is recognised based on human action recognition. For this purpose, a weighted feature fusion scheme is proposed to learn from multiple views. In order to protect the right to privacy of the inhabitants when a remote connection occurs, a privacy-by-context method is proposed. The experimental results of the behaviour recognition method show an outstanding performance, as well as support for multi-view scenarios and real-time execution, which are required in order to provide the proposed services.

show abstract

“…In this paper, we use GP to automatically synthesize spatio-temporal descriptors from a set of 3D filters and operators for dynamic hand gesture recognition. A simplified version of our method has been applied to extract features for action recognition in [18].…”

Section: Related Workmentioning

confidence: 99%

Synthesis of spatio-temporal descriptors for dynamic hand gesture recognition using genetic programming

Liu

Shao

2013

2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG)

Self Cite

View full text Add to dashboard Cite

Abstract-Automatic gesture recognition has received much attention due to its potential in various applications. In this paper, we successfully apply an evolutionary method-genetic programming (GP) to synthesize machine learned spatio-temporal descriptors for automatic gesture recognition instead of using hand-crafted descriptors. In our architecture, a set of primitive low-level 3D operators are first randomly assembled as treebased combinations, which are further evolved generation-bygeneration through the GP system, and finally a well performed combination will be selected as the best descriptor for high-level gesture recognition. To the best of our knowledge, this is the first report of using GP to evolve spatio-temporal descriptors for gesture recognition.We address this as a domain-independent optimization issue and evaluate our proposed method, respectively, on two public dynamic gesture datasets: Cambridge hand gesture dataset and Northwestern University hand gesture dataset to demonstrate its generalizability. The experimental results manifest that our GP-evolved descriptors can achieve better recognition accuracies than state-of-the-art hand-crafted techniques.

show abstract

Genetic Programming-Evolved Spatio-Temporal Descriptor for Human Action Recognition

Cited by 16 publications

References 34 publications

Exploiting deep residual networks for human action recognition from skeletal data

Exploiting deep residual networks for human action recognition from skeletal data

A Vision-Based System for Intelligent Monitoring: Human Behaviour Analysis and Privacy by Context

Synthesis of spatio-temporal descriptors for dynamic hand gesture recognition using genetic programming

Contact Info

Product

Resources

About