Dynamic Hand Gesture Recognition Using 3DCNN and LSTM with FSM Context-Aware Model

Hakim, Noorkholis Luthfil; Shih, Timothy K.; Arachchi, S. P. Kasthuri; Aditya, Wisnu; Chen, Yi Cheng; Lin, Chen‐Yuan

doi:10.3390/s19245429

Cited by 51 publications

(26 citation statements)

References 41 publications

(45 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In [31], a convolutional LSTM-VideoLSTM was used to learn spatio-temporal features from previously extracted spatial features. In [32] the proposed model is a combination of a three-dimensional convolutional neural network (3DCNN) and long short-term memory (LSTM) and used to extract the spatio-temporal features from the dataset containing RGB and depth images. In [33], spatiotemporal features were extracted in parallel utilizing a 3D convolutional neural network (3DCNN).…”

Section: Related Workmentioning

confidence: 99%

A Dynamic Gesture Recognition Interface for Smart Home Control based on Croatian Sign Language

et al. 2020

View full text Add to dashboard Cite

Deaf and hard-of-hearing people are facing many challenges in everyday life. Their communication is based on the use of a sign language, and the ability of the cultural/social environment to fully understand such a language defines whether or not it will be accessible for them. Technology is a key factor that has the potential to provide solutions to achieve a higher accessibility and therefore improve the quality of life of deaf and hard-of-hearing people. In this paper, we introduce a smart home automatization system specifically designed to provide real-time sign language recognition. The contribution of this paper implies several elements. Novel hierarchical architecture is presented, including resource-and-time-aware modules—a wake-up module and high-performance sign recognition module based on the Conv3D network. To achieve high-performance classification, multi-modal fusion of RGB and depth modality was used with the temporal alignment. Then, a small Croatian sign language database containing 25 different language signs for the use in smart home environment was created in collaboration with the deaf community. The system was deployed on a Nvidia Jetson TX2 embedded system with StereoLabs ZED M stereo camera for online testing. Obtained results demonstrate that the proposed practical solution is a viable approach for real-time smart home control.

show abstract

Section: Related Workmentioning

confidence: 99%

A Dynamic Gesture Recognition Interface for Smart Home Control based on Croatian Sign Language

et al. 2020

View full text Add to dashboard Cite

show abstract

“…Considering the advantages of combining CNN and LSTM networks, our baseline two-stream architecture Dual-3DCNNLSTM, which was also used to classify hand gestures in our IC4You project, consists of a 3DCNN network followed by a stack LSTM layer [16]. The filter size of each Conv3D layer is 3 × 3 × 3, and the stride and padding are 1 × 1 × 1.…”

Section: Dual-3dcnnlstm Modelmentioning

confidence: 99%

“…Later, during the preprocessing step, we extracted the hand from the body to input to the model. Gesture videos were recorded from 20 individuals using 11 dynamic gestures, click, grab, scroll-down, scroll-up, scroll-right, scroll-left, pinch, zoom out, zoom in, backward, and forward, that have been sampled and clearly visualized on the research web page [16]. The user needs to re-perform each gesture six times in a different manner.…”

Section: Hand Gestures Datasetmentioning

confidence: 99%

“…Moreover, the work of Molchanov et al [15] proposed a combination of 3DCNN and RNN, with the fully connected spatiotemporal features transferred into RNN. Inspired by this study, [16] proposed a combination of a 3DCNN and LSTM model to classify types of hand gestures.…”

Section: Introductionmentioning

confidence: 99%

“…Finally, we concatenate both spatial and motion features to evaluate the class probabilities to classify the dynamic patterns of input data. Since our study will classify the dynamic patterns of videos, this study considers fireworks [19] hand gestures [16], and human action videos from the HMDB51 dataset [20] to verify our proposed model. Furthermore, to demonstrate the significance of our proposed architecture, we conduct the performance examination under three datasets and standard HMDB51 benchmarking, finding that the performance boosts dramatically with the use of several state-of-the-art methods.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Modelling a Spatial-Motion Deep Learning Framework to Classify Dynamic Patterns of Videos

2020

Self Cite

View full text Add to dashboard Cite

Video classification is an essential process for analyzing the pervasive semantic information of video content in computer vision. Traditional hand-crafted features are insufficient when classifying complex video information due to the similarity of visual contents with different illumination conditions. Prior studies of video classifications focused on the relationship between the standalone streams themselves. In this paper, by leveraging the effects of deep learning methodologies, we propose a two-stream neural network concept, named state-exchanging long short-term memory (SE-LSTM). With the model of spatial motion state-exchanging, the SE-LSTM can classify dynamic patterns of videos using appearance and motion features. The SE-LSTM extends the general purpose of LSTM by exchanging the information with previous cell states of both appearance and motion stream. We propose a novel two-stream model Dual-CNNSELSTM utilizing the SE-LSTM concept combined with a Convolutional Neural Network, and use various video datasets to validate the proposed architecture. The experimental results demonstrate that the performance of the proposed two-stream Dual-CNNSELSTM architecture significantly outperforms other datasets, achieving accuracies of 81.62%, 79.87%, and 69.86% with hand gestures, fireworks displays, and HMDB51 datasets, respectively. Furthermore, the overall results signify that the proposed model is most suited to static background dynamic patterns classifications.

show abstract

Literature review of vision‐based dynamic gesture recognition using deep learning techniques

Jain

Karsh

Barbhuiya

2022

Concurrency and Computation

View full text Add to dashboard Cite

Summary Gesture recognition is the foremost need in building intelligent human‐computer interaction systems to solve many day‐to‐day problems and simplify human life in this digital world. The traditional machine learning (ML) algorithm tried to capture specific handcrafted features, failed miserably in some real‐world environments. Deep learning (DL) techniques have become a sensation among researchers in recent years, making the traditional ML approaches quite obsolete. However, existing reviews consider only a few datasets on which DL algorithm has been applied, and the categorization of the DL algorithms is vague in their review. This study provides the precise categorization of DL algorithms and considers around 15 gesture datasets on which these techniques have been applied. This study also provides a brief overview of the numerous challenging dataset available among the research community and insight into various challenges and limitations of a DL algorithm in vision‐based dynamic gesture recognition.

show abstract

Dynamic Hand Gesture Recognition Using 3DCNN and LSTM with FSM Context-Aware Model

Cited by 51 publications

References 41 publications

A Dynamic Gesture Recognition Interface for Smart Home Control based on Croatian Sign Language

A Dynamic Gesture Recognition Interface for Smart Home Control based on Croatian Sign Language

Modelling a Spatial-Motion Deep Learning Framework to Classify Dynamic Patterns of Videos

Literature review of vision‐based dynamic gesture recognition using deep learning techniques

Contact Info

Product

Resources

About