Online Detection and Classification of Dynamic Hand Gestures with Recurrent 3D Convolutional Neural Networks

Molchanov, Pavlo; Yang, Xiaodong; Gupta, Shalini; Kim, Kihwan; Tyree, Stephen; Kautz, Jan

doi:10.1109/cvpr.2016.456

Cited by 500 publications

(356 citation statements)

References 18 publications

Supporting

Mentioning

351

Contrasting

Unclassified

Order By: Relevance

“…In this section, we will elaborately explain our experiments and their results on our 3D residual attention network for hand gesture recognition in different video datasets. The performance of the Res3ATN network is tested on three open sourced datasets: EgoGesture [60], Jester [1], and NVIDIA Dynamic Hand Gesture dataset [38]. We compare and evaluate the performance of the Res3ATN with three other networks, i.e., C3D, ResNet-10, ResNext-101.…”

Section: Methodsmentioning

confidence: 99%

Res3ATN - Deep 3D Residual Attention Network for Hand Gesture Recognition in Videos

Dhingra

Kunz

2019

2019 International Conference on 3D Vision (3DV)

View full text Add to dashboard Cite

Hand gesture recognition is a strenuous task to solve in videos. In this paper, we use a 3D residual attention network which is trained end to end for hand gesture recognition. Based on the stacked multiple attention blocks, we build a 3D network which generates different features at each attention block. Our 3D attention based residual network (Res3ATN) can be built and extended to very deep layers. Using this network, an extensive analysis is performed on other 3D networks based on three publicly available datasets. The Res3ATN network performance is compared to C3D, ResNet-10, and ResNext-101 networks. We study and evaluate our baseline network with different number and position of attention blocks. The comparison shows that the 3D residual attention network with 3 attention blocks is robust in attention learning and can classify the gestures with better accuracy, thus outperforming existing networks.

show abstract

Section: Methodsmentioning

confidence: 99%

Res3ATN - Deep 3D Residual Attention Network for Hand Gesture Recognition in Videos

Dhingra

Kunz

2019

2019 International Conference on 3D Vision (3DV)

View full text Add to dashboard Cite

show abstract

“…Neverova et al [20] proposed a multi-scale architecture that could deal with varied gesture duration using all the modalities present in the ChaLearn dataset. More recently 3DCNNs and recurrent neural networks (RNN) were used for gesture recognition by Molchanov et al [17], setting benchmark performance for a dataset they introduced and the state of the art for the ChaLearn dataset. RNNs were used in this work to deal with varied gesture duration instead of the multi-scale approach proposed in [20].…”

Section: Previous Workmentioning

confidence: 99%

“…Cao et al [3] proposed an end-to-end learnable network that used 3D CNNs in conjunction with STTM and LSTMs, achieving the state of the art classification accuracy on the EgoGesture dataset. Considering the successful application of 3D CNNs and RNNs to gesture recognition in [17], Cao et al extended the network architecture to include STTMs and Recurrent STTMs(RSSTMs). Inspired by spatial transformer networks [12], STTMs transform a 3D feature map to compensate for the ego-motion that is introduced by head movement.…”

Section: Previous Workmentioning

confidence: 99%

Simultaneous Segmentation and Recognition: Towards More Accurate Ego Gesture Recognition

Chalasani

Smolić

2019

2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW)

View full text Add to dashboard Cite

Ego hand gestures can be used as an interface in AR and VR environments. While the context of an image is important for tasks like scene understanding, object recognition, image caption generation and activity recognition, it plays a minimal role in ego hand gesture recognition. An ego hand gesture used for AR and VR environments conveys the same information regardless of the background. With this idea in mind, we present our work on ego hand gesture recognition that produces embeddings from RBG images with ego hands, which are simultaneously used for ego hand segmentation and ego gesture recognition. To this extent, we achieved better recognition accuracy (96.9%) compared to the state of the art (92.2%) on the biggest ego hand gesture dataset available publicly. We present a gesture recognition deep neural network which recognises ego hand gestures from videos (videos containing a single gesture) by generating and recognising embeddings of ego hands from image sequences of varying length. We introduce the concept of simultaneous segmentation and recognition applied to ego hand gestures, present the network architecture, the training procedure and the results compared to the state of the art on the EgoGesture dataset [31].

show abstract

“…Simple heuristics like using the number of frames as the threshold for triggering prediction do not work well as gestures are of different duration. "Clockwise" gesture in NVIDIA gesture dataset [15] has a mean duration of 0.8 seconds while "one finger tap" has 0.4 seconds. Even the same gesture can be performed at varying speeds.…”

Section: Introductionmentioning

confidence: 99%

“…Molchanov et al [15] explored early gesture detection using connectionist temporal classification (CTC) [8]. CTC loss function enables gesture detection without requiring frame level annotations which makes it useful as annotation is time consuming and expensive.…”

Section: Introductionmentioning

confidence: 99%

Progression Modelling for Online and Early Gesture Detection

Gupta¹,

Dwivedi²,

Dabral

et al. 2019

2019 International Conference on 3D Vision (3DV)

View full text Add to dashboard Cite

Online and Early detection of gestures is crucial for building touchless gesture based interfaces. These interfaces should operate on a stream of video frames instead of the complete video and detect the presence of gestures at an earlier stage than post-completion for providing real time user experience. To achieve this, it is important to recognize the progression of the gesture across different stages so that appropriate responses can be triggered on reaching the desired execution stage. To address this, we propose a simple yet effective multi-task learning framework which models the progression of the gesture along with frame level recognition. The proposed framework recognizes the gestures at an early stage with high precision and also achieves stateof-the-art recognition accuracy of 87.8% which is closer to human accuracy of 88.4% on the NVIDIA gesture dataset in the offline configuration and advances the state-of-the-art by more than 4%. We also introduce tightly segmented annotations for the NVIDIA gesture dataset and setup a strong baseline for gesture localization for this dataset. We also evaluate our framework on the Montalbano dataset and report competitive results.

show abstract

Online Detection and Classification of Dynamic Hand Gestures with Recurrent 3D Convolutional Neural Networks

Cited by 500 publications

References 18 publications

Res3ATN - Deep 3D Residual Attention Network for Hand Gesture Recognition in Videos

Res3ATN - Deep 3D Residual Attention Network for Hand Gesture Recognition in Videos

Simultaneous Segmentation and Recognition: Towards More Accurate Ego Gesture Recognition

Progression Modelling for Online and Early Gesture Detection

Contact Info

Product

Resources

About