MEN: Mutual Enhancement Networks for Sign Language Recognition and Education

Liu, Zhengzhe; Pang, Liang; Qi, Xiaojuan

doi:10.1109/tnnls.2022.3174031

Cited by 10 publications

(5 citation statements)

References 88 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Various other appearance-based baselines have also been proposed in [15] including a) 2D CNN + Gated Recurrent Unit (GRU) and b) 3D-CNN claiming the best results obtained by the I3D network. In [29], a SLR and education system is proposed. This SLR system is built upon a spatiotemporal network for semantic category identification of a given sign video while the education system detects the failure mode of learners and guides them to sign correctly.…”

Section: Related Workmentioning

confidence: 99%

Signgraph: An Efficient and Accurate Pose-Based Graph Convolution Approach Toward Sign Language Recognition

et al. 2023

View full text Add to dashboard Cite

Sign language recognition (SLR) enables the deaf and speech-impaired community to integrate and communicate effectively with the rest of society. Word level or isolated SLR is a fundamental yet complex task with the main objective of using models to correctly recognize signed words. Sign language consists of very fast and complex hand, body, face movements, and mouthing cues that make the task very challenging. Several input modalities; RGB, optical Flow, RGB-D, and pose/skeleton have been proposed for SLR. However, the complexity of these modalities and the state-of-the-art (SOTA) methodologies tend to be exceedingly sophisticated and over-parametrized. In this paper, our focus is to use the hands and body poses as an input modality. One major problem in pose-based SLR is extracting the most valuable and distinctive features for all skeleton joints. In this regard, we propose an accurate, efficient, and lightweight pose-based pipeline leveraging a graph convolution network (GCN) along with residual connections and a bottleneck structure. The proposed architecture not only facilitates efficient learning during model training providing significantly improved accuracy scores but also alleviates computational complexity. With the proposed architecture in place, we are able to achieve improved accuracies on three different subsets of the WLASL dataset and the LSA-64 dataset. Our proposed model outperforms previous SOTA pose-based methods by providing a relative improvement of 8.91%, 27.62%, and 26.97% for WLASL-100, WLASL-300, and WLASL-1000 subsets. Moreover, our proposed model also outperforms previous SOTA appearance-based methods by providing a relative improvement of 2.65% and 5.15% for WLASL-300 and WLASL-1000 subsets. For the LSA-64 dataset, our model is able to achieve 100% test recognition accuracy. We are able to achieve this improved performance with far less computational cost as compared to existing appearancebased methods.

show abstract

Section: Related Workmentioning

confidence: 99%

Signgraph: An Efficient and Accurate Pose-Based Graph Convolution Approach Toward Sign Language Recognition

et al. 2023

View full text Add to dashboard Cite

show abstract

“…Another advantage of ReLU is that it helps to address the problem of vanishing gradients, which can occur when using other activation functions such as sigmoid or tanh. This is because ReLU does not saturate for positive input values, which means that it does not cause the gradients to become small, and thus, it does not hinder the learning process [17,18]. Besides, ReLU is an optimal choice for CNN architecture in Hand Sign Recognition due to its non-linearity and computational efficiency.…”

Section: Figure 2 Proposed Architecturementioning

confidence: 99%

A Deep Learning-Based Approach for Hand Sign Recognition Using CNN Architecture

Parashar,

Thakur,

Raju

et al. 2023

RIA

View full text Add to dashboard Cite

The domain of hand sign recognition, an integral facet of computer vision, encompasses a wide array of practical applications, ranging from interpreting sign language and recognizing gestures to facilitating human-computer interaction. This research elucidates the introduction of a Convolutional Neural Network (CNN) model tailored to the identification of hand signs representing the English alphabet. For model training and validation, a dataset comprising 26,000 grayscale images of hand signs was employed. The model architecture embraced a profound CNN design, featuring numerous layers for convolution and pooling, followed by fully connected layers. Employing the Adam optimizer, the training procedure yielded an impressive accuracy of 96.7% when evaluated on the Kaggle dataset. These outcomes underscore the effectiveness of the proposed CNN model in precisely discerning hand signs corresponding to the English alphabet. The model's potential utility extends to the recognition of intricate manual gestures and realtime applications, including aiding individuals with motor impairments and enriching virtual reality experiences. Hence, this study accentuates the capacity of deep learning to propel the domain of hand sign recognition forward.

show abstract

“…The fusion of multiple modalities provided robustness against environmental variations and improved performance in real-world scenarios. R. Li et al explored the effectiveness of transfer learning in hand sign language recognition using pre-trained CNN models [8]. By fine-tuning CNN architectures pre-trained on large-scale image datasets, they achieved notable improvements in recognition accuracy, demonstrating the potential of transfer learning for this task.…”

Section: Introductionmentioning

confidence: 99%

Deep Learning Recognition for Arabic Alphabet Sign Language RGB Dataset

Kharoua,

Jiang

2024

JCC

View full text Add to dashboard Cite

This paper introduces a Convolutional Neural Network (CNN) model for Arabic Sign Language (AASL) recognition, using the AASL dataset. Recognizing the fundamental importance of communication for the hearing-impaired, especially within the Arabic-speaking deaf community, the study emphasizes the critical role of sign language recognition systems. The proposed methodology achieves outstanding accuracy, with the CNN model reaching 99.9% accuracy on the training set and a validation accuracy of 97.4%. This study not only establishes a high-accuracy AASL recognition model but also provides insights into effective dropout strategies. The achieved high accuracy rates position the proposed model as a significant advancement in the field, holding promise for improved communication accessibility for the Arabic-speaking deaf community.

show abstract

MEN: Mutual Enhancement Networks for Sign Language Recognition and Education

Cited by 10 publications

References 88 publications

Signgraph: An Efficient and Accurate Pose-Based Graph Convolution Approach Toward Sign Language Recognition

Signgraph: An Efficient and Accurate Pose-Based Graph Convolution Approach Toward Sign Language Recognition

A Deep Learning-Based Approach for Hand Sign Recognition Using CNN Architecture

Deep Learning Recognition for Arabic Alphabet Sign Language RGB Dataset

Contact Info

Product

Resources

About