Key Action and Joint CTC-Attention based Sign Language Recognition

Li, Haibo; Gao, Liqing; Han, Ruize; Wan, Liang; Feng, Wei

doi:10.1109/icassp40776.2020.9054316

Cited by 15 publications

(5 citation statements)

References 8 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The results are presented in Figures 11–13, respectively. For comparison purposes, we include methods such as [15] (Gao et al , 2021), [22] (Pu et al , 2018), [23] (Huang et al , 2018), [24] (Wang et al , 2018a), [25] (Venugopalan et al , 2015b), [26] (Luong et al , 2015), [27] (Chan et al , 2016), [28] (Bin et al , 2018), [29] (Camgoz et al , 2020), [30] (Li et al , 2020), [31] (Lafferty et al , 2001), [32] (Morency et al , 2007), [33] (Zhang et al , 2014), [34] (Venugopalan and Rohrbach, 2015a), [35] (Pan et al , 2016) and [36] (Guo et al , 2019). We observe that the proposed Savitar model performs better than the state-of-the-art with a large margin on the three datasets, indicating the superiority of our model.…”

Section: Methodsmentioning

confidence: 99%

Savitar: an intelligent sign language translation approach for deafness and dysphonia in the COVID-19 era

Liang

2023

DTA

View full text Add to dashboard Cite

PurposeIn the COVID-19 era, sign language (SL) translation has gained attention in online learning, which evaluates the physical gestures of each student and bridges the communication gap between dysphonia and hearing people. The purpose of this paper is to devote the alignment between SL sequence and nature language sequence with high translation performance.Design/methodology/approachSL can be characterized as joint/bone location information in two-dimensional space over time, forming skeleton sequences. To encode joint, bone and their motion information, we propose a multistream hierarchy network (MHN) along with a vocab prediction network (VPN) and a joint network (JN) with the recurrent neural network transducer. The JN is used to concatenate the sequences encoded by the MHN and VPN and learn their sequence alignments.FindingsWe verify the effectiveness of the proposed approach and provide experimental results on three large-scale datasets, which show that translation accuracy is 94.96, 54.52, and 92.88 per cent, and the inference time is 18 and 1.7 times faster than listen-attend-spell network (LAS) and visual hierarchy to lexical sequence network (H2SNet) , respectively.Originality/valueIn this paper, we propose a novel framework that can fuse multimodal input (i.e. joint, bone and their motion stream) and align input streams with nature language. Moreover, the provided framework is improved by the different properties of MHN, VPN and JN. Experimental results on the three datasets demonstrate that our approaches outperform the state-of-the-art methods in terms of translation accuracy and speed.

show abstract

Section: Methodsmentioning

confidence: 99%

Savitar: an intelligent sign language translation approach for deafness and dysphonia in the COVID-19 era

Liang

2023

DTA

View full text Add to dashboard Cite

show abstract

“…Huang et al [22] introduced an attention mechanism into the encoding and decoding architecture to measure the influence of all inputs on the current decoding position. In addition, in order to effectively learn the multilevel information in sign language data, several works [23][24][25] adopted the hierarchical structure model to explore the information between learning levels. Zhou et al [8] proposed a Spatial-Temporal Multi-Cue (STMC) network to explore the effective complementarity of multimodal information.…”

Section: Related Workmentioning

confidence: 99%

“…When conducting experiments, the CSL was partitioned following previous work [25]. While the RWTH-Phoenix-Weather multi-signer 2014 dataset follows the settings adopted in a previous study [36], the dataset is divided into a training set, test set and development (dev) set.…”

Section: Datasetsmentioning

confidence: 99%

Continuous sign language recognition based on hierarchical memory sequence network

Xue,

Jia,

et al. 2023

IET Computer Vision

View full text Add to dashboard Cite

With the goal of solving the problem of feature extractors lacking strong supervision training and insufficient time information concerning single‐sequence model learning, a hierarchical sequence memory network with a multi‐level iterative optimisation strategy is proposed for continuous sign language recognition. This method uses the spatial‐temporal fusion convolution network (STFC‐Net) to extract the spatial‐temporal information of RGB and Optical flow video frames to obtain the multi‐modal visual features of a sign language video. Then, in order to enhance the temporal relationships of visual feature maps, the hierarchical memory sequence network is used to capture local utterance features and global context dependencies across time dimensions to obtain sequence features. Finally, the decoder decodes the final sentence sequence. In order to enhance the feature extractor, the authors adopted a multi‐level iterative optimisation strategy to fine‐tune STFC‐Net and the utterance feature extractor. The experimental results on the RWTH‐Phoenix‐Weather multi‐signer 2014 dataset and the Chinese sign language dataset show the effectiveness and superiority of this method.

show abstract

“…Furthermore, a hierarchical BLSTM with attention over sliding windows was used on the decoder to weigh the importance of the input frames. Li et al in [ 46 ], used a pyramid structure of BLSTMs in order to find key actions of the video representations, which were produced from the 2D-CNN. Moreover, an attention-based LSTM was used to align the input and output sequences and the whole network was trained jointly with Cross-Entropy and CTC losses.…”

Section: Sign Language Recognitionmentioning

confidence: 99%

Artificial Intelligence Technologies for Sign Language

Papastratis

Chatzikonstantinou

Konstantinidis

et al. 2021

Sensors

View full text Add to dashboard Cite

AI technologies can play an important role in breaking down the communication barriers of deaf or hearing-impaired people with other communities, contributing significantly to their social inclusion. Recent advances in both sensing technologies and AI algorithms have paved the way for the development of various applications aiming at fulfilling the needs of deaf and hearing-impaired communities. To this end, this survey aims to provide a comprehensive review of state-of-the-art methods in sign language capturing, recognition, translation and representation, pinpointing their advantages and limitations. In addition, the survey presents a number of applications, while it discusses the main challenges in the field of sign language technologies. Future research direction are also proposed in order to assist prospective researchers towards further advancing the field.

show abstract

Key Action and Joint CTC-Attention based Sign Language Recognition

Cited by 15 publications

References 8 publications

Savitar: an intelligent sign language translation approach for deafness and dysphonia in the COVID-19 era

Savitar: an intelligent sign language translation approach for deafness and dysphonia in the COVID-19 era

Continuous sign language recognition based on hierarchical memory sequence network

Artificial Intelligence Technologies for Sign Language

Contact Info

Product

Resources

About