2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2016
DOI: 10.1109/cvpr.2016.456
|View full text |Cite
|
Sign up to set email alerts
|

Online Detection and Classification of Dynamic Hand Gestures with Recurrent 3D Convolutional Neural Networks

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
351
0
4

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
4

Relationship

0
8

Authors

Journals

citations
Cited by 500 publications
(356 citation statements)
references
References 18 publications
1
351
0
4
Order By: Relevance
“…In this section, we will elaborately explain our experiments and their results on our 3D residual attention network for hand gesture recognition in different video datasets. The performance of the Res3ATN network is tested on three open sourced datasets: EgoGesture [60], Jester [1], and NVIDIA Dynamic Hand Gesture dataset [38]. We compare and evaluate the performance of the Res3ATN with three other networks, i.e., C3D, ResNet-10, ResNext-101.…”
Section: Methodsmentioning
confidence: 99%
“…In this section, we will elaborately explain our experiments and their results on our 3D residual attention network for hand gesture recognition in different video datasets. The performance of the Res3ATN network is tested on three open sourced datasets: EgoGesture [60], Jester [1], and NVIDIA Dynamic Hand Gesture dataset [38]. We compare and evaluate the performance of the Res3ATN with three other networks, i.e., C3D, ResNet-10, ResNext-101.…”
Section: Methodsmentioning
confidence: 99%
“…Neverova et al [20] proposed a multi-scale architecture that could deal with varied gesture duration using all the modalities present in the ChaLearn dataset. More recently 3DCNNs and recurrent neural networks (RNN) were used for gesture recognition by Molchanov et al [17], setting benchmark performance for a dataset they introduced and the state of the art for the ChaLearn dataset. RNNs were used in this work to deal with varied gesture duration instead of the multi-scale approach proposed in [20].…”
Section: Previous Workmentioning
confidence: 99%
“…Cao et al [3] proposed an end-to-end learnable network that used 3D CNNs in conjunction with STTM and LSTMs, achieving the state of the art classification accuracy on the EgoGesture dataset. Considering the successful application of 3D CNNs and RNNs to gesture recognition in [17], Cao et al extended the network architecture to include STTMs and Recurrent STTMs(RSSTMs). Inspired by spatial transformer networks [12], STTMs transform a 3D feature map to compensate for the ego-motion that is introduced by head movement.…”
Section: Previous Workmentioning
confidence: 99%
“…Simple heuristics like using the number of frames as the threshold for triggering prediction do not work well as gestures are of different duration. "Clockwise" gesture in NVIDIA gesture dataset [15] has a mean duration of 0.8 seconds while "one finger tap" has 0.4 seconds. Even the same gesture can be performed at varying speeds.…”
Section: Introductionmentioning
confidence: 99%
“…Molchanov et al [15] explored early gesture detection using connectionist temporal classification (CTC) [8]. CTC loss function enables gesture detection without requiring frame level annotations which makes it useful as annotation is time consuming and expensive.…”
Section: Introductionmentioning
confidence: 99%