Dynamic Sign Language Recognition Based on Video Sequence With BLSTM-3D Residual Networks

Liao, Yanqiu; Xiong, Pengwen; Min, Weidong; Min, Weiqiong; Lu, Jiahao

doi:10.1109/access.2019.2904749

Cited by 131 publications

(59 citation statements)

References 46 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The hidden state of the current clip is input into a softmax layer to estimate classconditional probabilities using connectionist temporal classification as a cost function. Liao et al [32] developed a deep 3-dimensional residual ConvNet and bi-directional LSTM networks for dynamic sign language recognition. Hand object was localized in the video frames using faster R-CNN, then a 3D ResNet jointly extracts spatial and temporal features from the input image sequences which classified using bidirectional LSTM.…”

Section: Related Workmentioning

confidence: 99%

DeepArSLR: A Novel Signer-Independent Deep Learning Framework for Isolated Arabic Sign Language Gestures Recognition

Aly

2020

IEEE Access

103

View full text Add to dashboard Cite

Hand gesture recognition has attracted the attention of many researchers due to its wide applications in robotics, games, virtual reality, sign language and human-computer interaction. Sign language is a structured form of hand gestures and the most effective communication way among hear-impaired people. Developing an efficient sign language recognition system to recognize dynamic isolated gestures encounters three major challenges, namely, hand segmentation, hand shape feature representation and gesture sequence recognition. Traditional sign language recognition methods utilize color-based hand segmentation algorithms to segment hands, hand-crafted feature extraction for hand shape representation and Hidden Markov Model (HMM) for sequence recognition. In this paper, a novel framework is proposed for signerindependent sign language recognition using multiple deep learning architectures comprising hand semantic segmentation, hand shape feature representation and deep recurrent neural network. The recently developed semantic segmentation method called DeepLabv3+ is trained using a set of pixel-labeled hand images to extract hand regions from each frame of the input video. Then, the extracted hand regions are cropped and scaled to a fixed size to alleviate hand scale variations. Extracting hand shape features is achieved using a single layer Convolutional Self-Organizing Map (CSOM) instead of relying on transfer learning of pretrained deep convolutional neural networks. The sequence of extracted feature vectors are then recognized using deep Bi-directional Long Short-Term Memory (BiLSTM) recurrent neural network. BiLSTM network contains three BiLSTM layers, one fully connected and softmax layers. The performance of the proposed method is evaluated using a challenging Arabic sign language database containing 23 isolated words captured from three different users. Experimental results show that the performance of proposed framework outperforms with large margin the state-of-the-art methods for signer-independent testing strategy. INDEX TERMS Arabic sign language recognition, deep learning, hand semantic segmentation, convolutional self-organizing map, signer-independent, deep BiLSTM network.

show abstract

Section: Related Workmentioning

confidence: 99%

DeepArSLR: A Novel Signer-Independent Deep Learning Framework for Isolated Arabic Sign Language Gestures Recognition

Aly

2020

IEEE Access

103

View full text Add to dashboard Cite

show abstract

“…As can be seen from Table 3, major Chinese Sign Language Recognition-related papers and their focus were listed, including conferences, journals, and workshops. These articles contain sensor-based and vision-based recognition methods, introducing some advanced and fashionable technologies such as SVM [23,24,39,40,101,102], DTW [103][104][105], HMM [20,22,23,101,[106][107][108][109][110][111][112][113][114], LSTM [115], ANN [116], CNN (3D-CNN) [70,71,117,118], HOD, HOG [23,24,103], and PCA [20]. Hidden Markov model (HMM) is a general processing method of Sign Language Recognition.…”

Section: Investigation Of Chinese Sign Language Recognitionmentioning

confidence: 99%

“…Validation on a private Chinese Sign Language vocabulary shows it has superiority to traditional HMM method. An advanced dynamic SLR method called BLSTM-3D ResNet was presented by Liao et al [115], which included a deep three-dimensional residual convolutional network and bidirectional LSTM networks. They localized the hand object from video frames and extracted spatial-temporal features by BLSTM-3D ResNet.…”

Section: Investigation Of Chinese Sign Language Recognitionmentioning

confidence: 99%

“…In Table 4, information including approaches of classification and feature extraction, accuracy/performance evaluation, and sample size/datasets is presented. In terms of data acquisition, camera [20, 22-24, 70, 71, 103, 117] and Kinect [115,127,130,131,146] are the major methods used. Removing the sensors and reducing costs are beneficial for using the camera.…”

Section: Characteristics Of Chinese Sign Language Recognitionmentioning

confidence: 99%

See 1 more Smart Citation

A Survey on Artificial Intelligence in Chinese Sign Language Recognition

et al. 2020

View full text Add to dashboard Cite

Chinese Sign Language (CSL) offers the main means of communication for the hearing impaired in China. Sign Language Recognition (SLR) can shorten the distance between the hearing-impaired and healthy people and help them integrate into the society. Therefore, SLR has become the focus of sign language application research. Over the years, the continuous development of new technologies provides a source and motivation for SLR. This paper aims to cover the most recent approaches in Chinese Sign Language Recognition (CSLR). With a thorough review of superior methods from 2000 to 2019 in CSLR researches, various techniques and algorithms such as scale-invariant feature transform, histogram of oriented gradients, wavelet entropy, Hu moment invariant, Fourier descriptor, gray-level co-occurrence matrix, dynamic time warping, principal component analysis, autoencoder, hidden Markov model (HMM), support vector machine (SVM), random forest, skin color modeling method, k-NN, artificial neural network, convolutional neural network (CNN), and transfer learning are discussed in detail, which are based on several major stages, that is, data acquisition, preprocessing, feature extraction, and classification. CSLR was summarized from some aspect as follows: methods of classification and feature extraction, accuracy/performance evaluation, and sample size/datasets. The advantages and limitations of different CSLR approaches were compared. It was found that data acquisition is mainly through Kinect and camera, and the feature extraction focuses on hand's shape and spatiotemporal factors, but ignoring facial expressions. HMM and SVM are used most in the classification. CNN is becoming more and more popular, and a deep neural network-based recognition approach will be the future trend. However, due to the complexity of the contemporary Chinese language, CSLR generally has a lower accuracy than other SLR. It is necessary to establish an appropriate dataset to conduct comparable experiments. The issue of decreasing accuracy as the dataset increases needs to resolve. Overall, our study is hoped to give a comprehensive presentation for those people who are interested in CSLR and SLR and to further contribute to the future research.

show abstract

“…In recent years, many researchers shift their attention from the traditional methods to convolutional neural networks (CNNs) [13][14][15] since they have achieved remarkable success in many important tasks of computer vision, such as classification, detection, and recognition. Lots of approaches have been proposed to solve the problem of tiny-face detection, which aims to search a tiny face in a whole image, especially in a low-resolution image.…”

Section: Introductionmentioning

confidence: 99%

Real-Time Pre-Identification and Cascaded Detection for Tiny Faces

Yang

Min

et al. 2019

Applied Sciences

View full text Add to dashboard Cite

Although the face detection problem has been studied for decades, searching tiny faces in the whole image is still a challenging task, especially in low-resolution images. Traditional face detection methods are based on hand-crafted features, but the features of tiny faces are different from those of normal-sized faces, and thus the detection robustness cannot be guaranteed. In order to alleviate the problem in existing methods, we propose a pre-identification mechanism and a cascaded detector (PMCD) for tiny-face detection. This pre-identification mechanism can greatly reduce background and other irrelevant information. The cascade detector is designed with two stages of deep convolutional neural network (CNN) to detect tiny faces in a coarse-to-fine manner, i.e., the face-area candidates are pre-identified as region of interest (RoI) based on a real-time pedestrian detector and the pre-identification mechanism, the set of RoI candidates is the input of the second sub-network instead of the whole image. Benefiting from the above mechanism, the second sub-network is designed as a shallow network which can keep high accuracy and real-time performance. The accuracy of PMCD is at least 4% higher than the other state-of-the-art methods on detecting tiny faces, while keeping real-time performance.

show abstract

Dynamic Sign Language Recognition Based on Video Sequence With BLSTM-3D Residual Networks

Cited by 131 publications

References 46 publications

DeepArSLR: A Novel Signer-Independent Deep Learning Framework for Isolated Arabic Sign Language Gestures Recognition

DeepArSLR: A Novel Signer-Independent Deep Learning Framework for Isolated Arabic Sign Language Gestures Recognition

A Survey on Artificial Intelligence in Chinese Sign Language Recognition

Real-Time Pre-Identification and Cascaded Detection for Tiny Faces

Contact Info

Product

Resources

About