Deep Hand: How to Train a CNN on 1 Million Hand Images When Your Data is Continuous and Weakly Labelled

Koller, Oscar; Ney, Hermann; Bowden, Richard

doi:10.1109/cvpr.2016.412

Cited by 215 publications

(174 citation statements)

References 39 publications

Supporting

Mentioning

164

Contrasting

Unclassified

Order By: Relevance

“…As shown in Table 3, our Hand SubUNet surpasses the hand shape recognition performance of the state-of-the-art CNN-based method proposed by Koller et al [27], by a margin of 18% Top-1 accuracy, which is a relative improvement of 30%. Koller et al [27] iteratively realigned and retrained his network whereas the SubUNet architecture automatically overcomes the frame alignment issue.…”

Section: Hand Subunet: End-to-end Hand Shape Recognition and Alignmentmentioning

confidence: 91%

“…As frame level annotations are hard to come by in continuous datasets, most of the work to date required an alignment step to localize individual signs in videos [10]. The work that is most relevant to this paper is by Koller et al [27] which combines deep-representations with traditional HMM based temporal modelling.…”

Section: Related Workmentioning

confidence: 99%

“…To evaluate the performance of our network we used the 3361 manually annotated hand images provided by [27], which are from the Development set of the RWTH-PHOENIX-Weather-2014 dataset. Again, because we are interested in the more challenging alignment & recognition problem, we run the system on the full (unseen) test sequences from which these images were taken.…”

Section: Hand Subunet: End-to-end Hand Shape Recognition and Alignmentmentioning

confidence: 99%

“…We use the One-Million Hands [27] dataset for training the Hand SubUNet. The dataset consists of cropped hand images collated from publicly available datasets, including Danish [29], New Zealand [31] and German (RWTH-PHOENIXWeather-2014 [14] sign languages.…”

Section: Hand Subunet: End-to-end Hand Shape Recognition and Alignmentmentioning

confidence: 99%

See 3 more Smart Citations

SubUNets: End-to-End Hand Shape and Continuous Sign Language Recognition

Camgöz

Hadfield

Koller

et al. 2017

2017 IEEE International Conference on Computer Vision (ICCV)

Self Cite

242

113

View full text Add to dashboard Cite

We propose a novel deep learning approach to solve simultaneous alignment and recognition problems (referred to as "Sequence-to-sequence" learning). We decompose the problem into a series of specialised expert systems referred to as SubUNets. The spatio-temporal relationships between these SubUNets are then modelled to solve the task, while remaining trainable end-to-end.The approach mimics human learning and educational techniques, and has a number of significant advantages. SubUNets allow us to inject domain-specific expert knowledge into the system regarding suitable intermediate representations. They also allow us to implicitly perform transfer learning between different interrelated tasks, which also allows us to exploit a wider range of more varied data sources.In our experiments we demonstrate that each of these properties serves to significantly improve the performance of the overarching recognition system, by better constraining the learning problem.The proposed techniques are demonstrated in the challenging domain of sign language recognition. We demonstrate state-of-the-art performance on hand-shape recognition (outperforming previous techniques by more than 30%). Furthermore, we are able to obtain comparable sign recognition rates to previous research, without the need for an alignment step to segment out the signs for recognition.

show abstract

Section: Hand Subunet: End-to-end Hand Shape Recognition and Alignmentmentioning

confidence: 91%

Section: Related Workmentioning

confidence: 99%

Section: Hand Subunet: End-to-end Hand Shape Recognition and Alignmentmentioning

confidence: 99%

Section: Hand Subunet: End-to-end Hand Shape Recognition and Alignmentmentioning

confidence: 99%

See 2 more Smart Citations

SubUNets: End-to-End Hand Shape and Continuous Sign Language Recognition

Camgöz

Hadfield

Koller

et al. 2017

2017 IEEE International Conference on Computer Vision (ICCV)

Self Cite

242

113

View full text Add to dashboard Cite

show abstract

“…In [24], Koller et al propose a CNN-HMM hybrid that learns to localize and recognize hand shapes. They first train a CNN using weak frame level annotations.…”

Section: Introductionmentioning

confidence: 99%

Particle Filter Based Probabilistic Forced Alignment for Continuous Gesture Recognition

Camgöz¹,

Hadfield²,

Bowden³

2017

2017 IEEE International Conference on Computer Vision Workshops (ICCVW)

Self Cite

View full text Add to dashboard Cite

In this paper, we propose a novel particle filter based probabilistic forced alignment approach for training spatiotemporal deep neural networks using weak border level annotations.The proposed method jointly learns to localize and recognize isolated instances in continuous streams. This is done by drawing training volumes from a prior distribution of likely regions and training a discriminative 3D-CNN from this data. The classifier is then used to calculate the posterior distribution by scoring the training examples and using this as the prior for the next sampling stage.We apply the proposed approach to the challenging task of large-scale user-independent continuous gesture recognition. We evaluate the performance on the popular ChaLearn 2016 Continuous Gesture Recognition (ConGD) dataset. Our method surpasses state-of-the-art results by obtaining 0.3646 and 0.3744 Mean Jaccard Index Score on the validation and test sets of ConGD, respectively. Furthermore, we participated in the ChaLearn 2017 Continuous Gesture Recognition Challenge and was ranked 3rd. It should be noted that our method is learner independent, it can be easily combined with other approaches.

show abstract

A State-of-Art Review on Automatic Video Annotation Techniques

Randive

Mohan

2019

Advances in Intelligent Systems and Computing

View full text Add to dashboard Cite

Deep Hand: How to Train a CNN on 1 Million Hand Images When Your Data is Continuous and Weakly Labelled

Cited by 215 publications

References 39 publications

SubUNets: End-to-End Hand Shape and Continuous Sign Language Recognition

SubUNets: End-to-End Hand Shape and Continuous Sign Language Recognition

Particle Filter Based Probabilistic Forced Alignment for Continuous Gesture Recognition

A State-of-Art Review on Automatic Video Annotation Techniques

Contact Info

Product

Resources

About