Deep Learning for Action and Gesture Recognition in Image Sequences: A Survey

Asadi-Aghbolaghi, Maryam; Clapés, Albert; Bellantonio, Marco; Ponce-López, Víctor; Baró, Xavier; Guyon, Isabelle; Kasaei, Shohreh; Escalera, Sérgio

doi:10.1007/978-3-319-57021-1_19

Cited by 42 publications

(21 citation statements)

References 145 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Moreover, deep facial recognition [45], gesture recognition [46], crowd detection [16], crowd behavior analysis [47], crime scene analysis [48], etc. using machine learning techniques have been subjects of great interest beyond computer science.…”

Section: The Smart City and Crimementioning

confidence: 99%

Enhancing City Sustainability through Smart Technologies: A Framework for Automatic Pre-Emptive Action to Promote Safety and Security Using Lighting and ICT-Based Surveillance

2020

View full text Add to dashboard Cite

The scope of the present paper is to promote social, cultural and environmental sustainability in cities by establishing a conceptual framework and the relationship amongst safety in urban public space (UPS), lighting and Information and Communication Technology (ICT)-based surveillance. This framework uses available technologies and tools, as these can be found in urban equipment such as lighting posts, to enhance security and safety in UPS, ensuring protection against attempted criminal activity. Through detailed literary research, publications on security and safety concerning crime and lighting can be divided into two periods, the first one pre-1994, and the second one from 2004–2008. Since then, a significant reduction in the number of publications dealing with lighting and crime is observed, while at the same time, the urban nightscape has been reshaped with the immersion of light-emitting diode (LED) technologies. Especially in the last decade, where most municipalities in the EU28 (European Union of all the member states from the accession of Croatia in 2013 to the withdrawal of the United Kingdom in 2020) are refurbishing their road lighting with LED technology and the consideration of smart networks and surveillance is under development, the use of lighting to deter possible attempted felonies in UPS is not addressed. To capitalize on the potential of lighting as a deterrent, this paper proposes a framework that uses existing technology, namely, dimmable LED light sources, presence sensors, security cameras, as well as emerging techniques such as artificial intelligence (AI)-enabled image recognition algorithms and big data analytics and presents a possible system that could be developed as a stand-alone product to alert possible dangerous situations, deter criminal activity and promote the perception of safety thus linking lighting and ICT-based surveillance towards safety and security in UPS.

show abstract

Section: The Smart City and Crimementioning

confidence: 99%

Enhancing City Sustainability through Smart Technologies: A Framework for Automatic Pre-Emptive Action to Promote Safety and Security Using Lighting and ICT-Based Surveillance

2020

View full text Add to dashboard Cite

show abstract

“…We also review some recent methods of feature augmentation. More comprehensive reviews on hand gesture recognition are found in [36,37,38,39].…”

Section: Related Workmentioning

confidence: 99%

MFA-Net: Motion Feature Augmented Network for Dynamic Hand Gesture Recognition from Skeletal Data

Chen

Wang

Guo³

et al. 2019

Sensors

View full text Add to dashboard Cite

Dynamic hand gesture recognition has attracted increasing attention because of its importance for human–computer interaction. In this paper, we propose a novel motion feature augmented network (MFA-Net) for dynamic hand gesture recognition from skeletal data. MFA-Net exploits motion features of finger and global movements to augment features of deep network for gesture recognition. To describe finger articulated movements, finger motion features are extracted from the hand skeleton sequence via a variational autoencoder. Global motion features are utilized to represent the global movements of hand skeleton. These motion features along with the skeleton sequence are then fed into three branches of a recurrent neural network (RNN), which augment the motion features for RNN and improve the classification performance. The proposed MFA-Net is evaluated on two challenging skeleton-based dynamic hand gesture datasets, including DHG-14/28 dataset and SHREC’17 dataset. Experimental results demonstrate that our proposed method achieves comparable performance on DHG-14/28 dataset and better performance on SHREC’17 dataset when compared with start-of-the-art methods.

show abstract

“…In [17], deep architectures used for action recognition are categorized in four groups: 2D models, motion-based input features, 3D models and temporal networks. In the first group, [18] uses a pre-trained model on one or more frames which are sampled from the whole video.…”

Section: B Two-stream I3dmentioning

confidence: 99%

“…Therefore, we only focus on the cross-subject evaluation. In the cross-subject evaluation, samples of subjects 1,2,4,5,8,9,13,14,15,16,17,18,19,25,27,28,31,34,35 and 38 were used as training and samples of the remaining subjects were reserved for testing.…”

Section: A Datasetsmentioning

confidence: 99%

Action Tube Extraction Based 3D-CNN for RGB-D Action Recognition

Morros

2018

2018 International Conference on Content-Based Multimedia Indexing (CBMI)

View full text Add to dashboard Cite

In this paper we propose a novel action tube extractor for RGB-D action recognition in trimmed videos. The action tube extractor takes as input a video and outputs an action tube. The method consists of two parts: spatial tube extraction and temporal sampling. The first part is built upon MobileNet-SSD and its role is to define the spatial region where the action takes place. The second part is based on the structural similarity index (SSIM) and is designed to remove frames without obvious motion from the primary action tube. The final extracted action tube has two benefits: 1) a higher ratio of ROI (subjects of action) to background; 2) most frames contain obvious motion change. We propose to use a two-stream (RGB and Depth) I3D architecture as our 3D-CNN model. Our approach outperforms the state-of-the-art methods on the OA and NTU RGB-D datasets.

show abstract

Deep Learning for Action and Gesture Recognition in Image Sequences: A Survey

Cited by 42 publications

References 145 publications

Enhancing City Sustainability through Smart Technologies: A Framework for Automatic Pre-Emptive Action to Promote Safety and Security Using Lighting and ICT-Based Surveillance

Enhancing City Sustainability through Smart Technologies: A Framework for Automatic Pre-Emptive Action to Promote Safety and Security Using Lighting and ICT-Based Surveillance

MFA-Net: Motion Feature Augmented Network for Dynamic Hand Gesture Recognition from Skeletal Data

Action Tube Extraction Based 3D-CNN for RGB-D Action Recognition

Contact Info

Product

Resources

About