Word-level Deep Sign Language Recognition from Video: A New Large-scale Dataset and Methods Comparison

Li, Dongxu; Opazo, Cristian Rodriguez; Yu, Xin; Li, Hongdong

doi:10.1109/wacv45572.2020.9093512

Cited by 320 publications

(285 citation statements)

References 69 publications

Supporting

Mentioning

239

Contrasting

Order By: Relevance

“…In parallel to the success of the deep learning based models in other domains, many works in the SLR domain recently conduct research using deep neural networks. In these approaches, instead of hand-crafted feature extraction, Convolutional Neural Networks (CNNs) are utilized effectively [1], [3], [4], [10], [15], [34]- [37]. While some of these studies do not require any segmentation methods [1], [3], [4], [35], some studies prefer to use neural networks, such as Fast R-CNN and Faster R-CNN, in order to locate the hand region [15], [34], [36].…”

Section: Related Workmentioning

confidence: 99%

“…It provides only 483 RGB samples in total. An extended list of sign language datasets can be found in [3], [42]. Montalbano Italian gesture dataset [43], which has recently become one of the most widely used isolated SLR datasets, contains 20 gestures and approximately 14,000 samples in total.…”

Section: A Sign Language Datasetsmentioning

confidence: 99%

“…In the recent years, some studies use 3D-CNNs in order to capture spatial-temporal features together [2], [3], [37]. In [3], pose based and visual appearance based approaches are compared. They compare 2D-CNNs with RNNs and 3D-CNNs for visual appearance based baselines.…”

Section: A Sign Language Datasetsmentioning

confidence: 99%

“…Since attention mechanisms achieve promising results in action recognition problem, it also attracts the researchers in the SLR domain. In [3], an attention based 3D-CNN network is proposed for CSL recognition. On the proposed method, they incorporate spatial attention into 3D-CNN to select skeleton joints of hand and the arm; spatial attention map peaks around these regions.…”

Section: A Sign Language Datasetsmentioning

confidence: 99%

“…However, in order to train a deep learning based sign language recognition model, the amount of training data is crucial. In recent years, larger datasets have been published [2], [3], [21], which contain a large vocabulary size [21], large number of samples [3], with many signers [2]. These datasets help building practical SLR models.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

AUTSL: A Large Scale Multi-Modal Turkish Sign Language Dataset and Baseline Methods

2020

View full text Add to dashboard Cite

Sign language recognition is a challenging problem where signs are identified by simultaneous local and global articulations of multiple sources, i.e. hand shape and orientation, hand movements, body posture, and facial expressions. Solving this problem computationally for a large vocabulary of signs in real life settings is still a challenge, even with the state-of-the-art models. In this study, we present a new largescale multi-modal Turkish Sign Language dataset (AUTSL) with a benchmark and provide baseline models for performance evaluations. Our dataset consists of 226 signs performed by 43 different signers and 38,336 isolated sign video samples in total. Samples contain a wide variety of backgrounds recorded in indoor and outdoor environments. Moreover, spatial positions and the postures of signers also vary in the recordings. Each sample is recorded with Microsoft Kinect v2 and contains color image (RGB), depth, and skeleton modalities. We prepared benchmark training and test sets for user independent assessments of the models. We trained several deep learning based models and provide empirical evaluations using the benchmark; we used Convolutional Neural Networks (CNNs) to extract features, unidirectional and bidirectional Long Short-Term Memory (LSTM) models to characterize temporal information. We also incorporated feature pooling modules and temporal attention to our models to improve the performances. We evaluated our baseline models on AUTSL and Montalbano datasets. Our models achieved competitive results with the state-of-the-art methods on Montalbano dataset, i.e. 96.11% accuracy. In AUTSL random train-test splits, our models performed up to 95.95% accuracy. In the proposed user-independent benchmark dataset our best baseline model achieved 62.02% accuracy. The gaps in the performances of the same baseline models show the challenges inherent in our benchmark dataset. AUTSL benchmark dataset is publicly available at https://cvml.ankara.edu.tr.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: A Sign Language Datasetsmentioning

confidence: 99%

Section: A Sign Language Datasetsmentioning

confidence: 99%

Section: A Sign Language Datasetsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

AUTSL: A Large Scale Multi-Modal Turkish Sign Language Dataset and Baseline Methods

2020

View full text Add to dashboard Cite

show abstract

Generative model‐enhanced human motion prediction

et al. 2022

View full text Add to dashboard Cite

The task of predicting human motion is complicated by the natural heterogeneity and compositionality of actions, necessitating robustness to distributional shifts as far as out-of-distribution (OoD). Here, we formulate a new OoD benchmark based on the Human3.6M and Carnegie Mellon University (CMU) motion capture datasets, and introduce a hybrid framework for hardening discriminative architectures to OoD failure by augmenting them with a generative model. When applied to current state-of-the-art discriminative models, we show that the proposed approach improves OoD robustness without sacrificing in-distribution performance, and can theoretically facilitate model interpretability. We suggest human motion predictors ought to be constructed with OoD challenges in mind, and provide an extensible general framework for hardening diverse discriminative architectures to extreme distributional shift.

show abstract

A review on computational methods based automated sign language recognition system for hearing and speech impaired community

Robert

Hemanth

2023

Concurrency and Computation

View full text Add to dashboard Cite

Summary The recent advancements in computer vision and deep learning have led to promising progress in various motion detection and gesture recognition methods. Thriving efforts in the field of sign language recognition (SLR) during recent years led to interaction between humans and computer systems. Contributing a real‐time automated Sign Language recognition system will be a remarkable treasure for hearing and speech‐impaired people that will break the barriers of interaction with the real world. Albeit several research works are accomplished in sign language recognition, there is still a demand for developing a real‐time automated sign language recognition system. Compared to other methodologies, the techniques adopted may have advantages and disadvantages, and they vary from researcher to researcher. There are still some issues with employing these SLR models and procedures regularly, even though numerous research studies have been conducted to determine the most acceptable methods and models for sign language recognition. It gets more expensive and difficult in terms of resources when the developed automated SLR system becomes available as a product. Hence, for the welfare of the hearing and speech‐impaired community, the researchers are still endeavoring way to find a cost‐effective method. This work brings forth the challenges faced by scientists in developing a cost‐effective commercial prototype for the hearing and speech‐impaired community. This paper also explores and analyses various deep learning techniques and methods used in developing a sign language recognition system. The objective behind this work is to identify the best method which produces high accuracy in developing a cost‐effective sign language recognition system aiding communication between signers and non‐signers so that they can be a part of growing technology.

show abstract

Word-level Deep Sign Language Recognition from Video: A New Large-scale Dataset and Methods Comparison

Cited by 320 publications

References 69 publications

AUTSL: A Large Scale Multi-Modal Turkish Sign Language Dataset and Baseline Methods

AUTSL: A Large Scale Multi-Modal Turkish Sign Language Dataset and Baseline Methods

Generative model‐enhanced human motion prediction

A review on computational methods based automated sign language recognition system for hearing and speech impaired community

Contact Info

Product

Resources

About