Spatial-Temporal Graph Convolutional Networks for Sign Language Recognition

Amorim, Cleison Correia de; Macêdo, David; Zanchettin, Cleber

doi:10.1007/978-3-030-30493-5_59

Cited by 47 publications

(36 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…It showed encouraging performance on action recognition of the NTU rgb+d dataset [30]. The proposed algorithm in [29] was modified to accept a custom graph layout, which is appropriate for sign language graph representation [31]. This modified version of the algorithm was evaluated on a dataset containing 20 selected classes from the ASLLVD dataset.…”

Section: Related Workmentioning

confidence: 99%

“…The Skeleton Aware multi-stream sign language recognition framework is one of the most recent graph-based systems for sign language recognition [32,33]. These frameworks combined the ST-GCN [31] with other input channels such as RGB frames and optical flow; in a multimodality scheme, the different modalities are integrated and fused at different levels. Even though these systems achieved excellent performance on the AUTSL dataset, the main drawback of this framework is that it is slow and involves a high computation cost.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Spatial Attention-Based 3D Graph Convolutional Neural Network for Sign Language Recognition

Al-Hammadi

Bencherif

Alsulaiman

et al. 2022

Sensors

View full text Add to dashboard Cite

Sign language is the main channel for hearing-impaired people to communicate with others. It is a visual language that conveys highly structured components of manual and non-manual parameters such that it needs a lot of effort to master by hearing people. Sign language recognition aims to facilitate this mastering difficulty and bridge the communication gap between hearing-impaired people and others. This study presents an efficient architecture for sign language recognition based on a convolutional graph neural network (GCN). The presented architecture consists of a few separable 3DGCN layers, which are enhanced by a spatial attention mechanism. The limited number of layers in the proposed architecture enables it to avoid the common over-smoothing problem in deep graph neural networks. Furthermore, the attention mechanism enhances the spatial context representation of the gestures. The proposed architecture is evaluated on different datasets and shows outstanding results.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Spatial Attention-Based 3D Graph Convolutional Neural Network for Sign Language Recognition

Al-Hammadi

Bencherif

Alsulaiman

et al. 2022

Sensors

View full text Add to dashboard Cite

show abstract

“…Attention LSTM, attention GRU and Transformer networks were also tested but they led to inferior performance. De Amorim et al in [ 82 ], proposed an American SLR method that extracts skeletal data from video sequences and then processes them using a Spatio-Temporal Graph Convolutional Network (GCN) [ 83 ]. Tunga et al in [ 84 ], proposed a SLR method that extracts skeletal features from video sequences and then employs a GCN network to model spatial dependencies among the skeletal data, as well as a BERT model to model temporal dependencies among the skeletal data.…”

Section: Sign Language Recognitionmentioning

confidence: 99%

“…In this way, the authors achieved a really high accuracy of 97.36% in the CSL-500 dataset. GCNs are computationally lighter than the image processing networks, but they often cannot extract highly enriched features, thus leading to inferior performance, as noted in [ 82 ].…”

Section: Sign Language Recognitionmentioning

confidence: 99%

Artificial Intelligence Technologies for Sign Language

Papastratis

Chatzikonstantinou

Konstantinidis

et al. 2021

Sensors

View full text Add to dashboard Cite

AI technologies can play an important role in breaking down the communication barriers of deaf or hearing-impaired people with other communities, contributing significantly to their social inclusion. Recent advances in both sensing technologies and AI algorithms have paved the way for the development of various applications aiming at fulfilling the needs of deaf and hearing-impaired communities. To this end, this survey aims to provide a comprehensive review of state-of-the-art methods in sign language capturing, recognition, translation and representation, pinpointing their advantages and limitations. In addition, the survey presents a number of applications, while it discusses the main challenges in the field of sign language technologies. Future research direction are also proposed in order to assist prospective researchers towards further advancing the field.

show abstract

“…However, skeleton-based SLR methods are still under exploration. Simply applying the ST-GCN to SLR has been unsuccessful, which only reaches around 60% top-1 accuracy on 20 classes (much worse than RGB-based approaches) [48]. Multi-modal Approach aims to explore data captured from different resources, by different devices, and from distinctive views to improve the overall performance.…”

Section: Related Workmentioning

confidence: 99%

Sign Language Recognition via Skeleton-Aware Multi-Model Ensemble

Jiang¹,

Sun²,

Wang³

et al. 2021

Preprint

View full text Add to dashboard Cite

Sign language is commonly used by deaf or mute people to communicate but requires extensive effort to master. It is usually performed with the fast yet delicate movement of hand gestures, body posture, and even facial expressions. Current Sign Language Recognition (SLR) methods usually extract features via deep neural networks and suffer overfitting due to limited and noisy data. Recently, skeleton-based action recognition has attracted increasing attention due to its subject-invariant and background-invariant nature, whereas skeleton-based SLR is still under exploration due to the lack of hand annotations. Some researchers have tried to use off-line hand pose trackers to obtain hand keypoints and aid in recognizing sign language via recurrent neural networks. Nevertheless, none of them outperforms RGBbased approaches yet. To this end, we propose a novel Skeleton Aware Multi-modal Framework with a Global Ensemble Model (GEM) for isolated SLR (SAM-SLR-v2) to learn and fuse multimodal feature representations towards a higher recognition rate. Specifically, we propose a Sign Language Graph Convolution Network (SL-GCN) to model the embedded dynamics of skeleton keypoints and a Separable Spatial-Temporal Convolution Network (SSTCN) to exploit skeleton features. The skeletonbased predictions are fused with other RGB and depth based modalities by the proposed late-fusion GEM to provide global information and make a faithful SLR prediction. Experiments on three isolated SLR datasets demonstrate that our proposed SAM-SLR-v2 framework is exceedingly effective and achieves state-of-the-art performance with significant margins. Our code will be available at https://github.com/jackyjsy/SAM-SLR-v2

show abstract

Spatial-Temporal Graph Convolutional Networks for Sign Language Recognition

Cited by 47 publications

References 15 publications

Spatial Attention-Based 3D Graph Convolutional Neural Network for Sign Language Recognition

Spatial Attention-Based 3D Graph Convolutional Neural Network for Sign Language Recognition

Artificial Intelligence Technologies for Sign Language

Sign Language Recognition via Skeleton-Aware Multi-Model Ensemble

Contact Info

Product

Resources

About