SubUNets: End-to-End Hand Shape and Continuous Sign Language Recognition

Camgöz, Necati Cihan; Hadfield, Simon; Koller, Oscar; Bowden, Richard

doi:10.1109/iccv.2017.332

Cited by 244 publications

(125 citation statements)

References 28 publications

Supporting

Mentioning

113

Contrasting

Unclassified

Order By: Relevance

“…As future work, it would be interesting to extend the attention mechanisms to the spatial domain to align building blocks of signs, also known as subunits, with their spoken language translations. It may also be possible to use an approach similar to SubUNets [6] to inject specialist intermediate subunit knowledge, bridging the gap between S2T and S2G2T.…”

Section: Resultsmentioning

confidence: 99%

“…Until recently SLR methods have mainly used handcrafted intermediate representations [33,16] and the temporal changes in these features have been modelled using classical graph based approaches, such as Hidden Markov Models (HMMs) [58], Conditional Random Fields [62] or template based methods [5,48]. However, with the emergence of DL, SLR researchers have quickly adopted Convolutional Neural Networks (CNNs) [40] for manual [35,37] and non-manual [34] feature representation, and Recurrent Neural Networks (RNNs) for temporal modelling [6,36,17].…”

Section: Related Workmentioning

confidence: 99%

“…It has obtained state-of-the-art performance on several tasks in speech recognition [27,2] and clearly dominates hand writing recognition [26]. Computer vision researchers adopted CTC and applied it to weakly labeled visual problems, such as lip reading [3], action recognition [30], hand shape recognition [6] and CSLR [6,17].…”

Section: Related Workmentioning

confidence: 99%

See 2 more Smart Citations

Neural Sign Language Translation

Camgöz

Hadfield²,

Koller³

et al. 2018

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition

Self Cite

375

265

View full text Add to dashboard Cite

Sign Language Recognition (SLR) has been an active research field for the last two decades. However, most research to date has considered SLR as a naive gesture recognition problem. SLR seeks to recognize a sequence of continuous signs but neglects the underlying rich grammatical and linguistic structures of sign language that differ from spoken language. In contrast, we introduce the Sign Language Translation (SLT) problem. Here, the objective is to generate spoken language translations from sign language videos, taking into account the different word orders and grammar. We formalize SLT in the framework of Neural Machine Translation (NMT) for both end-to-end and pretrained settings (using expert knowledge). This allows us to jointly learn the spatial representations, the underlying language model, and the mapping between sign and spoken language. To evaluate the performance of Neural SLT, we collected the first publicly available Continuous SLT dataset, RWTH-PHOENIX-Weather 2014T 1. It provides spoken language translations and gloss level annotations for German Sign Language videos of weather broadcasts. Our dataset contains over .95M frames with >67K signs from a sign vocabulary of >1K and >99K words from a German vocabulary of >2.8K. We report quantitative and qualitative results for various SLT setups to underpin future research in this newly established field. The upper bound for translation performance is calculated at 19.26 BLEU-4, while our end-to-end frame-level and gloss-level tokenization networks were able to achieve 9.58 and 18.13 respectively.

show abstract

Section: Resultsmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Neural Sign Language Translation

Camgöz

Hadfield²,

Koller³

et al. 2018

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition

Self Cite

375

265

View full text Add to dashboard Cite

show abstract

“…Some research studies are already investigating continuous sign language translation [59][60][61]. Our future plan is to embed our system into a continuous sign language translation system.…”

Section: Resultsmentioning

confidence: 99%

A Novel String Grammar Unsupervised Possibilistic C-Medians Algorithm for Sign Language Translation Systems

2017

View full text Add to dashboard Cite

Abstract:Sign language is a basic method for solving communication problems between deaf and hearing people. In order to communicate, deaf and hearing people normally use hand gestures, which include a combination of hand positioning, hand shapes, and hand movements. Thai Sign Language is the communication method for Thai hearing-impaired people. Our objective is to improve the dynamic Thai Sign Language translation method with a video captioning technique that does not require prior hand region detection and segmentation through using the Scale Invariant Feature Transform (SIFT) method and the String Grammar Unsupervised Possibilistic C-Medians (sgUPCMed) algorithm. This work is the first to propose the sgUPCMed algorithm to cope with the unsupervised generation of multiple prototypes in the possibilistic sense for string data. In our experiments, the Thai Sign Language data set (10 isolated sign language words) was collected from 25 subjects. The best average result within the constrained environment of the blind test data sets of signer-dependent cases was 89-91%, and the successful rate of signer semi-independent cases was 81-85%, on average. For the blind test data sets of signer-independent cases, the best average classification rate was 77-80%. The average result of the system without a constrained environment was around 62-80% for the signer-independent experiments. To show that the proposed algorithm can be implemented in other sign languages, the American sign language (RWTH-BOSTON-50) data set, which consists of 31 isolated American Sign Language words, is also used in the experiment. The system provides 88.56% and 91.35% results on the validation set alone, and for both the training and validation sets, respectively.

show abstract

“…Such applications include Continuous Sign Language Recognition [4] and Video Captioning [7]. To be able to train spatio-temporal deep networks using sequence level annotations, researchers adopted sequence-to-sequence learning methods from other fields, namely Connectionist Temporal Classification [18] from Speech Recognition [19] and Encoder-Decoder Networks [8] from the field of Neural Machine Translations [1].…”

Section: Introductionmentioning

confidence: 99%

Particle Filter Based Probabilistic Forced Alignment for Continuous Gesture Recognition

Camgöz¹,

Hadfield²,

Bowden³

2017

2017 IEEE International Conference on Computer Vision Workshops (ICCVW)

Self Cite

View full text Add to dashboard Cite

In this paper, we propose a novel particle filter based probabilistic forced alignment approach for training spatiotemporal deep neural networks using weak border level annotations.The proposed method jointly learns to localize and recognize isolated instances in continuous streams. This is done by drawing training volumes from a prior distribution of likely regions and training a discriminative 3D-CNN from this data. The classifier is then used to calculate the posterior distribution by scoring the training examples and using this as the prior for the next sampling stage.We apply the proposed approach to the challenging task of large-scale user-independent continuous gesture recognition. We evaluate the performance on the popular ChaLearn 2016 Continuous Gesture Recognition (ConGD) dataset. Our method surpasses state-of-the-art results by obtaining 0.3646 and 0.3744 Mean Jaccard Index Score on the validation and test sets of ConGD, respectively. Furthermore, we participated in the ChaLearn 2017 Continuous Gesture Recognition Challenge and was ranked 3rd. It should be noted that our method is learner independent, it can be easily combined with other approaches.

show abstract

SubUNets: End-to-End Hand Shape and Continuous Sign Language Recognition

Cited by 244 publications

References 28 publications

Neural Sign Language Translation

Neural Sign Language Translation

A Novel String Grammar Unsupervised Possibilistic C-Medians Algorithm for Sign Language Translation Systems

Particle Filter Based Probabilistic Forced Alignment for Continuous Gesture Recognition

Contact Info

Product

Resources

About