Multimodal Spatiotemporal Networks for Sign Language Recognition

Zhang, Shujun; Meng, Weijia; Li, Hui; Cui, Xuehong

doi:10.1109/access.2019.2959206

Cited by 25 publications

(14 citation statements)

References 47 publications

(45 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The aforementioned previous study achieved 72.73% accuracy on the leap motion data alone, which is the same dataset used in the experiments in this article. Zhang et al's 2019 study [25] discovered that multi-modality could drastically improve sign recognition when fusing RGB and Depth data. The model presented by the study was computationally expensive, requiring two VGG16 convolutional neural networks to process the sensor information.…”

Section: Background and Related Workmentioning

confidence: 99%

Statistical and Spatio-temporal Hand Gesture Features for Sign Language Recognition using the Leap Motion Sensor

Bird¹

2022

Preprint

View full text Add to dashboard Cite

In modern society, people should not be identified based on their disability, rather, it is environments that can disable people with impairments. Improvements to automatic Sign Language Recognition (SLR) will lead to more enabling environments via digital technology. Many state-of-the-art approaches to SLR focus on the classification of static hand gestures, but communication is a temporal activity, which is reflected by many of the dynamic gestures present. Given this, temporal information during the delivery of a gesture is not often considered within SLR. The experiments in this work consider the problem of SL gesture recognition regarding how dynamic gestures change during their delivery, and this study aims to explore how single types of features as well as mixed features affect the classification ability of a machine learning model. 18 common gestures recorded via a Leap Motion Controller sensor provide a complex classification problem. Two sets of features are extracted from a 0.6 second time window, statistical descriptors and spatio-temporal attributes. Features from each set are compared by their ANOVA F-Scores and p-values, arranged into bins grown by 10 features per step to a limit of the 250 highest-ranked features. Results show that the best statistical model selected 240 features and scored 85.96% accuracy, the best spatio-temporal model selected 230 features and scored 80.98%, and the best mixed-feature model selected 240 features from each set leading to a classification accuracy of 86.75%. When all three sets of results are compared (146 individual machine learning models), the overall distribution shows that the minimum results are increased when inputs are any number of mixed features compared to any number of either of the two single sets of features.

show abstract

Section: Background and Related Workmentioning

confidence: 99%

Statistical and Spatio-temporal Hand Gesture Features for Sign Language Recognition using the Leap Motion Sensor

Bird¹

2022

Preprint

View full text Add to dashboard Cite

show abstract

“…These features were finally concatenated and fed to an encoder-decoder LSTM network that predicted sub-words that form the signed word. Zhang et al in [ 91 ], proposed a highly accurate SLR method that initially selected pairs of aligned RGB-D images to reduce redundancy. Then, the proposed method computed discriminative features from hand regions using a spatial stream and extracted depth motion features using a temporal stream.…”

Section: Sign Language Recognitionmentioning

confidence: 99%

Artificial Intelligence Technologies for Sign Language

Papastratis

Chatzikonstantinou

Konstantinidis

et al. 2021

Sensors

View full text Add to dashboard Cite

AI technologies can play an important role in breaking down the communication barriers of deaf or hearing-impaired people with other communities, contributing significantly to their social inclusion. Recent advances in both sensing technologies and AI algorithms have paved the way for the development of various applications aiming at fulfilling the needs of deaf and hearing-impaired communities. To this end, this survey aims to provide a comprehensive review of state-of-the-art methods in sign language capturing, recognition, translation and representation, pinpointing their advantages and limitations. In addition, the survey presents a number of applications, while it discusses the main challenges in the field of sign language technologies. Future research direction are also proposed in order to assist prospective researchers towards further advancing the field.

show abstract

“…Zhang et al [25] used RGB and depth images together in their study and gained a 6% of improvement compared to the sole use of RGB. Also, they reported that depth images were more robust against the changes in the light and environment; and they were able to capture the signs better.…”

Section: Related Workmentioning

confidence: 99%

Turkish sign language recognition based on multistream data fusion

Gündüz¹,

Polat²

2021

Turk J Elec Eng & Comp Sci

View full text Add to dashboard Cite

Sign languages are non-verbal, visual languages that hearing or speech impaired people use for communication. Aside from hands, other communication channels such as body posture and facial expressions are also valuable in sign languages. As a result of the fact that the gestures in sign languages vary across countries, the significance of communication channels in each sign language also differs. In this study, representing the communication channels used in Turkish sign language, a total of 8 different data streams-4 RGB, 3 pose, 1 optical flow-were analyzed. Inception 3D was used for RGB and optical flow; and LSTM-RNN was used for pose data streams. Experiments were conducted by merging the data streams in different combinations, and then a sign language recognition system that merged the most suitable streams with the help of a multistream late fusion mechanism was proposed. Considering each data stream individually, the accuracies of the RGB streams were between 28% and 79%; pose stream accuracies were between 9% and 50%; and optical flow data accuracy was 78.5%. When these data streams were used in combination, the sign language recognition performance was higher in comparison to any of the data streams alone. The proposed sign language recognition system uses a multistream data fusion mechanism and gives an accuracy of 89.3% on BosphorusSign General dataset. The multistream data fusion mechanisms have a great potential for improving sign language recognition results.

show abstract

Multimodal Spatiotemporal Networks for Sign Language Recognition

Cited by 25 publications

References 47 publications

Statistical and Spatio-temporal Hand Gesture Features for Sign Language Recognition using the Leap Motion Sensor

Statistical and Spatio-temporal Hand Gesture Features for Sign Language Recognition using the Leap Motion Sensor

Artificial Intelligence Technologies for Sign Language

Turkish sign language recognition based on multistream data fusion

Contact Info

Product

Resources

About