Method for Multimodal Recognition of One-Handed Sign Language Gestures Through 3D Convolution and LSTM Neural Networks

Kagirov, Ildar; Ryumin, Dmitry; Axyonov, Alexandr

doi:10.1007/978-3-030-26061-3_20

Cited by 9 publications

(6 citation statements)

References 30 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…However, it is worth noting that there is a problem with each signer showing gestures at different speeds. That is why almost all modern gesture recognition methods are reduced to processing a video sequence that provides information about the movements of any part of the human body, for example, a hand or both hands in time and space [ 129 , 130 , 131 , 132 , 133 , 134 ]. Additionally, the presence of complex background situations on video frames that dynamically change leads to rather serious recognition problems due to insufficient use of the spatial features: hand gestures are relatively small in size compared to the entire background environment.…”

Section: Methodsmentioning

confidence: 99%

Audio-Visual Speech and Gesture Recognition by Sensors of Mobile Devices

Ryumin

Ivanko

Ryumina

2023

Sensors

Self Cite

View full text Add to dashboard Cite

Audio-visual speech recognition (AVSR) is one of the most promising solutions for reliable speech recognition, particularly when audio is corrupted by noise. Additional visual information can be used for both automatic lip-reading and gesture recognition. Hand gestures are a form of non-verbal communication and can be used as a very important part of modern human–computer interaction systems. Currently, audio and video modalities are easily accessible by sensors of mobile devices. However, there is no out-of-the-box solution for automatic audio-visual speech and gesture recognition. This study introduces two deep neural network-based model architectures: one for AVSR and one for gesture recognition. The main novelty regarding audio-visual speech recognition lies in fine-tuning strategies for both visual and acoustic features and in the proposed end-to-end model, which considers three modality fusion approaches: prediction-level, feature-level, and model-level. The main novelty in gesture recognition lies in a unique set of spatio-temporal features, including those that consider lip articulation information. As there are no available datasets for the combined task, we evaluated our methods on two different large-scale corpora—LRW and AUTSL—and outperformed existing methods on both audio-visual speech recognition and gesture recognition tasks. We achieved AVSR accuracy for the LRW dataset equal to 98.76% and gesture recognition rate for the AUTSL dataset equal to 98.56%. The results obtained demonstrate not only the high performance of the proposed methodology, but also the fundamental possibility of recognizing audio-visual speech and gestures by sensors of mobile devices.

show abstract

Section: Methodsmentioning

confidence: 99%

Audio-Visual Speech and Gesture Recognition by Sensors of Mobile Devices

Ryumin

Ivanko

Ryumina

2023

Sensors

Self Cite

View full text Add to dashboard Cite

show abstract

“…Разработаны и исследованы интегральные кодер-декодер модели для распознавания слитной русской речи с использованием коннекционной временной классификации с применением различным типов нейронные сетей, таких как Highway, ResNet, DenseNet, DiracNet, Transformer, обученные с использованием методов аугментации обучающих речевых данных, показавшие большую скорость распознавания по сравнению со стандартной системой распознавания речи [65,66]. Разработан метод многомодального (цветной видеопоток и карта глубины) распознавания статических и динамических одноручных жестов русского жестового языка с помощью трехмерной сверточной глубокой нейронной сети с долгой кратковременной памятью (LSTM), которая позволяет извлекать как кратковременные, так и долгосрочные пространственно-временные характеристики жестов [67,68]. Разработан метод распознавания эмоций в диалоговой речи на основе иерархичной модели рекуррентной нейронной сети с длинной кратковременной памятью (RNN-LSTM), а также метод адаптации данных, позволяющий эффективно использовать кросскорпусную экспериментальную установку, что дает возможность увеличить количество обучающих данных и сделать модель более универсальной [69 -72].…”

Section: рм юсупов дв бакурадзе санкт-петербургский институт информат...unclassified

История СПБ ФИЦ РАН: 45 Лет Научной Деятельности

2023

View full text Add to dashboard Cite

Издание посвящается 45-летию Федерального государственного бюджетного учреждения науки «Санкт-Петербургский Федеральный исследовательский центр Российской академии наук», содержит статьи по истории его создания и развития, а также копии ряда информационных и исторических документов.

show abstract

“…Результаты современных исследований дают основания считать, что методы машинного обучения, основанные на глубоких нейронных сетях, по сравнению с традиционными классическими подходами [46], которые базируются на линейных классификаторах (например, метод опорных векторов) имеют определенную специфику. Они показывают хорошие результаты в решении задач сегментации, классификации, а также распознавании как статических, так и динамических жестов.…”

Section: заключениеunclassified

Approaches to Automatic Gesture Recognition: Hardware and Methods Overview.

Ryumin¹,

Kagirov²

2021

Manned Spaceflight

View full text Add to dashboard Cite

In this paper, hardware and software solutions addressed to automatic gesture recognition are considered. Trends in image analysis in the current computer vision-based approaches are analysed. Each of the considered approaches was addressed, in order to reveal their advantages and drawbacks. Research papers on the usability of gesture interfaces were reviewed. It was revealed that sensor-based systems, being quite accurate and demonstrating high speed of recognition, have limited application due to the specificity of devices (gloves, suit) and their relatively narrow distribution. At the same time, computer vision-based approaches can be successfully applied only when problems of occlusions and datasets are solved. The results obtained can be used for designing training systems.

show abstract

Method for Multimodal Recognition of One-Handed Sign Language Gestures Through 3D Convolution and LSTM Neural Networks

Cited by 9 publications

References 30 publications

Audio-Visual Speech and Gesture Recognition by Sensors of Mobile Devices

Audio-Visual Speech and Gesture Recognition by Sensors of Mobile Devices

История СПБ ФИЦ РАН: 45 Лет Научной Деятельности

Approaches to Automatic Gesture Recognition: Hardware and Methods Overview.

Contact Info

Product

Resources

About