Sign Language Recognition via Skeleton-Aware Multi-Model Ensemble

Jiang, Songyao; Sun, Bin; Wang, Lichen; Bai, Yue; Li, Kunpeng; Fu, Yun

doi:10.48550/arxiv.2110.06161

Cited by 10 publications

(19 citation statements)

References 66 publications

(77 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The Skeleton Aware multi-stream sign language recognition framework is one of the most recent graph-based systems for sign language recognition [32,33]. These frameworks combined the ST-GCN [31] with other input channels such as RGB frames and optical flow; in a multimodality scheme, the different modalities are integrated and fused at different levels.…”

Section: Related Workmentioning

confidence: 99%

Spatial Attention-Based 3D Graph Convolutional Neural Network for Sign Language Recognition

Al-Hammadi

Bencherif

Alsulaiman

et al. 2022

Sensors

View full text Add to dashboard Cite

Sign language is the main channel for hearing-impaired people to communicate with others. It is a visual language that conveys highly structured components of manual and non-manual parameters such that it needs a lot of effort to master by hearing people. Sign language recognition aims to facilitate this mastering difficulty and bridge the communication gap between hearing-impaired people and others. This study presents an efficient architecture for sign language recognition based on a convolutional graph neural network (GCN). The presented architecture consists of a few separable 3DGCN layers, which are enhanced by a spatial attention mechanism. The limited number of layers in the proposed architecture enables it to avoid the common over-smoothing problem in deep graph neural networks. Furthermore, the attention mechanism enhances the spatial context representation of the gestures. The proposed architecture is evaluated on different datasets and shows outstanding results.

show abstract

Section: Related Workmentioning

confidence: 99%

Spatial Attention-Based 3D Graph Convolutional Neural Network for Sign Language Recognition

Al-Hammadi

Bencherif

Alsulaiman

et al. 2022

Sensors

View full text Add to dashboard Cite

show abstract

“…Jiang et al [142] devised a SAM SLR (Skeleton Aware multimodal framework Sign language recognition) concerning isolated sign language recognition. The skeleton-aware multi-modal (SSTCN-Separable spatial-temporal convolution network) results in better accuracy on the AUTSL dataset, with a top 1 accuracy of 98.42% for RGB and 98.53% for RGB RGB-D. Papastratis et al [143] Jiang et al [145] designed SMA-SLR-v2 (Skeleton aware multimodal framework with global ensemble model) based on isolated sign language recognition. They achieved the top 1 accuracy of 98.53% for AUSTL (RGBD all), the top 1 accuracy of 59.39% for the WLASL2000 dataset per instance case, and 56.63% per class, and the top 1 accuracy of 99% accuracy for isolated SLR 500 dataset.…”

Section: B Study Of Current State-of-the-art Models For Sign Language...mentioning

confidence: 99%

A Comprehensive Review of Sign Language Recognition: Different Types, Modalities, and Datasets

Madhiarasan¹,

Roy²

2022

Preprint

View full text Add to dashboard Cite

A machine can understand human activities, and the meaning of signs can help overcome the communication barriers between the inaudible and ordinary people. Sign Language Recognition (SLR) is a fascinating research area and a crucial task concerning computer vision and pattern recognition. Recently, SLR usage has increased in many applications, but the environment, background image resolution, modalities, and datasets affect the performance a lot. Many researchers have been striving to carry out generic real-time SLR models. This review paper facilitates a comprehensive overview of SLR and discusses the needs, challenges, and problems associated with SLR. We study related works about manual and non-manual, various modalities, and datasets. Research progress and existing stateof-the-art SLR models over the past decade have been reviewed. Finally, we find the research gap and limitations in this domain and suggest future directions. This review paper will be helpful for readers and researchers to get complete guidance about SLR and the progressive design of the state-of-the-art SLR model.

show abstract

“…More recently, the success of pose estimation techniques and Graph Convolutional Network (GCN) architectures has shifted researchers' attention to skeleton-based approaches in both action recognition and SLR domains (Kipf and Welling, 2016 ; Yan et al, 2018 ; Cao et al, 2019 ; Jiang et al, 2021 ). In these methods, graphs are often formed by connecting skeleton joint information (obtained via pose estimation techniques) according to the natural human body connections and processed through a GCN network.…”

Section: Introductionmentioning

confidence: 99%

“…In these methods, graphs are often formed by connecting skeleton joint information (obtained via pose estimation techniques) according to the natural human body connections and processed through a GCN network. As an improvement over earlier GCN architectures, ST-GCN has been proposed for skeleton-based action recognition to model spatial and temporal dimensions simultaneously and later was adapted to the SLR problem (Yan et al, 2018 ; Jiang et al, 2021 ).…”

Section: Introductionmentioning

confidence: 99%

Multi-cue temporal modeling for skeleton-based sign language recognition

2023

View full text Add to dashboard Cite

Sign languages are visual languages used as the primary communication medium for the Deaf community. The signs comprise manual and non-manual articulators such as hand shapes, upper body movement, and facial expressions. Sign Language Recognition (SLR) aims to learn spatial and temporal representations from the videos of the signs. Most SLR studies focus on manual features often extracted from the shape of the dominant hand or the entire frame. However, facial expressions combined with hand and body gestures may also play a significant role in discriminating the context represented in the sign videos. In this study, we propose an isolated SLR framework based on Spatial-Temporal Graph Convolutional Networks (ST-GCNs) and Multi-Cue Long Short-Term Memorys (MC-LSTMs) to exploit multi-articulatory (e.g., body, hands, and face) information for recognizing sign glosses. We train an ST-GCN model for learning representations from the upper body and hands. Meanwhile, spatial embeddings of hand shape and facial expression cues are extracted from Convolutional Neural Networks (CNNs) pre-trained on large-scale hand and facial expression datasets. Thus, the proposed framework coupling ST-GCNs with MC-LSTMs for multi-articulatory temporal modeling can provide insights into the contribution of each visual Sign Language (SL) cue to recognition performance. To evaluate the proposed framework, we conducted extensive analyzes on two Turkish SL benchmark datasets with different linguistic properties, BosphorusSign22k and AUTSL. While we obtained comparable recognition performance with the skeleton-based state-of-the-art, we observe that incorporating multiple visual SL cues improves the recognition performance, especially in certain sign classes where multi-cue information is vital. The code is available at: https://github.com/ogulcanozdemir/multicue-slr.

show abstract

Sign Language Recognition via Skeleton-Aware Multi-Model Ensemble

Cited by 10 publications

References 66 publications

Spatial Attention-Based 3D Graph Convolutional Neural Network for Sign Language Recognition

Spatial Attention-Based 3D Graph Convolutional Neural Network for Sign Language Recognition

A Comprehensive Review of Sign Language Recognition: Different Types, Modalities, and Datasets

Multi-cue temporal modeling for skeleton-based sign language recognition

Contact Info

Product

Resources

About