Isolated Sign Recognition from RGB Video using Pose Flow and Self-Attention

Coster, Mathieu De; Herreweghe, Mieke Van; Dambre, Joni

doi:10.1109/cvprw53098.2021.00383

Cited by 33 publications

(13 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The author used a custom dataset of over 24624 images for the experiment. Mathieu De Coster et al, [8] proposed a sign language recognition methodology over the Flemish Sign Language corpus. The author has used OpenPose feature extraction and end-toend learning with CNN, and applied a multi-head attention approach to isolated sign recognition.…”

Section: Related Workmentioning

confidence: 99%

“…Over the class of 100 signs, 74.7% accuracy has been obtained as a state-of-the-art result over the Flemish Sign Language Corpus. The author introduces the Multimodal Transformer Network with Pose LSTM and Pose Transformer, especially self-attention for sign language recognition [8]. Mannan A. et al, [9] proposed Hypertuned DeepCNN for American Static sign language, author has used data augmentation to create more number of learning data sample, as deep learning model accuracy will increase with more samples for the training process.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

SIGNFORMER: DeepVision Transformer for Sign Language Recognition

et al. 2023

View full text Add to dashboard Cite

Sign language is the most common form of communication for the deaf and dumb. To bridge the communication gap with such impaired people, normal people should be able to recognize signs. Therefore, it is necessary to introduce a sign language recognition system to assist such impaired people. This paper proposes the Transformer Encoder as a useful tool for sign language recognition. For the recognition of static Indian signs, the authors have implemented a vision transformer. To recognize static Indian sign language, proposed methodology archives noticeable performance over other state-of-the-art convolution architecture. The suggested methodology divides the sign into a series of positional embedding patches, which are then sent to a transformer block with four self-attention layers and a multilayer perceptron network. Experimental results show satisfactory identification of gestures under various augmentation methods. Moreover, the proposed approach only requires a very small number of training epochs to achieve 99.29 percent accuracy.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

SIGNFORMER: DeepVision Transformer for Sign Language Recognition

et al. 2023

View full text Add to dashboard Cite

show abstract

“…Other recent work employed a video transformer network (VTN) for sign language recognition [34]. VTN is a modified version of the transformer that was deployed for machine translation [35].…”

Section: Related Workmentioning

confidence: 99%

“…The proposed architecture was compared with the state-of-the-art graph-based architecture on both AUTSL and ASLLVD datasets. In Table 3, the performance of the proposed architecture is compared with the reported results for different variants of the VTN architecture on the AUTSL dataset [34]. From Table 3, we can observe that the proposed architecture with spatial attention enhancement outperformed the best variant of VTN (VTN-PF) on both the validation and test datasets.…”

mentioning

confidence: 99%

Spatial Attention-Based 3D Graph Convolutional Neural Network for Sign Language Recognition

Al-Hammadi

Bencherif

Alsulaiman

et al. 2022

Sensors

View full text Add to dashboard Cite

Sign language is the main channel for hearing-impaired people to communicate with others. It is a visual language that conveys highly structured components of manual and non-manual parameters such that it needs a lot of effort to master by hearing people. Sign language recognition aims to facilitate this mastering difficulty and bridge the communication gap between hearing-impaired people and others. This study presents an efficient architecture for sign language recognition based on a convolutional graph neural network (GCN). The presented architecture consists of a few separable 3DGCN layers, which are enhanced by a spatial attention mechanism. The limited number of layers in the proposed architecture enables it to avoid the common over-smoothing problem in deep graph neural networks. Furthermore, the attention mechanism enhances the spatial context representation of the gestures. The proposed architecture is evaluated on different datasets and shows outstanding results.

show abstract

“…They achieved a top 1 accuracy of 69.9% for NMFs-CSL datasets and 96.8% for isolated SLR 500 datasets. De Coster et al [141] proposed Pose flow and hand cropping associated to video transformer network-based isolated sign language recognition. The VTN-PF (Video Transformer Network with hand cropping and pose) model evaluation on the AUTSL dataset got an accuracy of 92.92 %.…”

Section: B Study Of Current State-of-the-art Models For Sign Language...mentioning

confidence: 99%

A Comprehensive Review of Sign Language Recognition: Different Types, Modalities, and Datasets

Madhiarasan¹,

Roy²

2022

Preprint

View full text Add to dashboard Cite

A machine can understand human activities, and the meaning of signs can help overcome the communication barriers between the inaudible and ordinary people. Sign Language Recognition (SLR) is a fascinating research area and a crucial task concerning computer vision and pattern recognition. Recently, SLR usage has increased in many applications, but the environment, background image resolution, modalities, and datasets affect the performance a lot. Many researchers have been striving to carry out generic real-time SLR models. This review paper facilitates a comprehensive overview of SLR and discusses the needs, challenges, and problems associated with SLR. We study related works about manual and non-manual, various modalities, and datasets. Research progress and existing stateof-the-art SLR models over the past decade have been reviewed. Finally, we find the research gap and limitations in this domain and suggest future directions. This review paper will be helpful for readers and researchers to get complete guidance about SLR and the progressive design of the state-of-the-art SLR model.

show abstract

Isolated Sign Recognition from RGB Video using Pose Flow and Self-Attention

Cited by 33 publications

References 22 publications

SIGNFORMER: DeepVision Transformer for Sign Language Recognition

SIGNFORMER: DeepVision Transformer for Sign Language Recognition

Spatial Attention-Based 3D Graph Convolutional Neural Network for Sign Language Recognition

A Comprehensive Review of Sign Language Recognition: Different Types, Modalities, and Datasets

Contact Info

Product

Resources

About