Deep Sign: Enabling Robust Statistical Continuous Sign Language Recognition via Hybrid CNN-HMMs

Koller, Oscar; Zargaran, Sepehr; Ney, Hermann; Bowden, Richard

doi:10.1007/s11263-018-1121-3

Cited by 157 publications

(87 citation statements)

References 43 publications

(55 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Some use colored gloves to ease hand and finger tracking [26]. Recent advances in machine learning -i.e., deep learning and convolutional neural networks (CNNs) -have improved state-of-the-art computer vision approaches [76], though lack of sufficient training data currently limits the use of modern Artificial Intelligence (AI) techniques in this problem space.…”

Section: Recognition and Computer Visionmentioning

confidence: 99%

Sign Language Recognition, Generation, and Translation

Bragg

Koller

Bellard

et al. 2019

The 21st International ACM SIGACCESS Conference on Computers and Accessibility

Self Cite

265

130

View full text Add to dashboard Cite

Developing successful sign language recognition, generation, and translation systems requires expertise in a wide range of fields, including computer vision, computer graphics, natural language processing, human-computer interaction, linguistics, and Deaf culture. Despite the need for deep interdisciplinary knowledge, existing research occurs in separate disciplinary silos, and tackles separate portions of the sign language processing pipeline. This leads to three key questions: 1) What does an interdisciplinary view of the current landscape reveal? 2) What are the biggest challenges facing the field? and 3) What are the calls to action for people working in the field? To help answer these questions, we brought together a diverse group of experts for a two-day workshop. This paper presents the results of that interdisciplinary workshop, providing key background that is often overlooked by computer scientists, a review of the state-of-the-art, a set of pressing challenges, and a call to action for the research community.Each group focused on the following questions:

show abstract

Section: Recognition and Computer Visionmentioning

confidence: 99%

Sign Language Recognition, Generation, and Translation

Bragg

Koller

Bellard

et al. 2019

The 21st International ACM SIGACCESS Conference on Computers and Accessibility

Self Cite

265

130

View full text Add to dashboard Cite

show abstract

“…Huang et al [15] learn a hand detector based on Faster R-CNN [33] using manually annotated signing hand bounding boxes, and apply it to general sign language recognition. Some sign language recognition approaches use no hand or pose preprocessing as a separate step (e.g., [22]), and indeed many signs involve large motions that do not require fine-grained gesture understanding. However, for fingerspelling recognition it is particularly important to understand fine-grained distinctions in handshape.…”

Section: Related Workmentioning

confidence: 99%

Fingerspelling Recognition in the Wild With Iterative Visual Attention

Shi

Rio

Keane

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

View full text Add to dashboard Cite

Sign language recognition is a challenging gesture sequence recognition problem, characterized by quick and highly coarticulated motion. In this paper we focus on recognition of fingerspelling sequences in American Sign Language (ASL) videos collected in the wild, mainly from YouTube and Deaf social media. Most previous work on sign language recognition has focused on controlled settings where the data is recorded in a studio environment and the number of signers is limited. Our work aims to address the challenges of real-life data, reducing the need for detection or segmentation modules commonly used in this domain. We propose an end-to-end model based on an iterative attention mechanism, without explicit hand detection or segmentation. Our approach dynamically focuses on increasingly high-resolution regions of interest. It outperforms prior work by a large margin. We also introduce a newly collected data set of crowdsourced annotations of fingerspelling in the wild, and show that performance can be further improved with this additional data set.

show abstract

“…To tackle the problem, we exploit weak labels covering three modalities, namely gesture, mouth shape and hand shape and exploit the fact that all three contain sequential information with loose time synchronisation with respect to each other. We extend our previous work on hybrid HMM modelling for sign language recognition [3] [4] [5] by adding multi-stream HMMs with synchronisation constraints. The hybrid HMM modelling has shown to outperform other sequence learning approaches on sign language recognition data sets while requiring less memory and allowing for deeper architectures [4].…”

Section: Related Workmentioning

confidence: 99%

Weakly Supervised Learning with Multi-Stream CNN-LSTM-HMMs to Discover Sequential Parallelism in Sign Language Videos

Koller

Camgöz

Ney

et al. 2020

IEEE Trans. Pattern Anal. Mach. Intell.

Self Cite

226

117

View full text Add to dashboard Cite

In this work we present a new approach to the field of weakly supervised learning in the video domain. Our method is relevant to sequence learning problems which can be split up into sub-problems that occur in parallel. Here, we experiment with sign language data. The approach exploits sequence constraints within each independent stream and combines them by explicitly imposing synchronisation points to make use of parallelism that all sub-problems share. We do this with multi-stream HMMs while adding intermediate synchronisation constraints among the streams. We embed powerful CNN-LSTM models in each HMM stream following the hybrid approach. This allows the discovery of attributes which on their own lack sufficient discriminative power to be identified. We apply the approach to the domain of sign language recognition exploiting the sequential parallelism to learn sign language, mouth shape and hand shape classifiers. We evaluate the classifiers on three publicly available benchmark data sets featuring challenging real-life sign language with over 1000 classes, full sentence based lipreading and articulated hand shape recognition on a fine-grained hand shape taxonomy featuring over 60 different hand shapes. We clearly outperform the state-of-the-art on all data sets and observe significantly faster convergence using the parallel alignment approach.

show abstract

Deep Sign: Enabling Robust Statistical Continuous Sign Language Recognition via Hybrid CNN-HMMs

Cited by 157 publications

References 43 publications

Sign Language Recognition, Generation, and Translation

Sign Language Recognition, Generation, and Translation

Fingerspelling Recognition in the Wild With Iterative Visual Attention

Weakly Supervised Learning with Multi-Stream CNN-LSTM-HMMs to Discover Sequential Parallelism in Sign Language Videos

Contact Info

Product

Resources

About