Recently, automatic hand gesture recognition has gained increasing importance for two principal reasons: the growth of the deaf and hearing-impaired population, and the development of visionbased applications and touchless control on ubiquitous devices. As hand gesture recognition is at the core of sign language analysis a robust hand gesture recognition system should consider both spatial and temporal features. Unfortunately, finding discriminative spatiotemporal descriptors for a hand gesture sequence is not a trivial task. In this study, we proposed an efficient deep convolutional neural networks approach for hand gesture recognition. The proposed approach employed transfer learning to beat the scarcity of a large labeled hand gesture dataset. We evaluated it using three gesture datasets from color videos: 40, 23, and 10 classes were used from these datasets. The approach obtained recognition rates of 98.12%, 100%, and 76.67% on the three datasets, respectively for the signer-dependent mode. For the signer-independent mode, it obtained recognition rates of 84.38%, 34.9%, and 70% on the three datasets, respectively. INDEX TERMS 3DCNN, computer vision, deep learning, hand gesture recognition, sign language recognition, transfer learning.
Hand gesture recognition is an attractive research field with a wide range of applications, including video games and telesurgery techniques. Another important application of hand gesture recognition is the translation of sign language, which is a complicated structured form of hand gestures. In sign language, the fingers' configuration, the hand's orientation, and the hand's relative position to the body are the primitives of structured expressions. The importance of hand gesture recognition has increased due to the prevalence of touchless applications and the rapid growth of the hearing-impaired population. However, developing an efficient recognition system needs to overcome the challenges of hand segmentation, local hand shape representation, global body configuration representation, and gesture sequence modeling. In this paper, a novel system is proposed for dynamic hand gesture recognition using multiple deep learning architectures for hand segmentation, local and global feature representations, and sequence feature globalization and recognition. The proposed system is evaluated on a very challenging dataset, which consists of 40 dynamic hand gestures performed by 40 subjects in an uncontrolled environment. The results show that the proposed system outperforms stateof-the-art approaches, demonstrating its effectiveness.
Sign language is the main channel for hearing-impaired people to communicate with others. It is a visual language that conveys highly structured components of manual and non-manual parameters such that it needs a lot of effort to master by hearing people. Sign language recognition aims to facilitate this mastering difficulty and bridge the communication gap between hearing-impaired people and others. This study presents an efficient architecture for sign language recognition based on a convolutional graph neural network (GCN). The presented architecture consists of a few separable 3DGCN layers, which are enhanced by a spatial attention mechanism. The limited number of layers in the proposed architecture enables it to avoid the common over-smoothing problem in deep graph neural networks. Furthermore, the attention mechanism enhances the spatial context representation of the gestures. The proposed architecture is evaluated on different datasets and shows outstanding results.
This paper presents a novel Arabic Sign Language (ArSL) recognition system, using selected 2D hands and body key points from successive video frames. The system recognizes the recorded video signs, for both signer dependent and signer independent modes, using the concatenation of a 3D CNN skeleton network and a 2D point convolution network. To accomplish this, we built a new ArSL video-based sign database. We will present the detailed methodology of recording the new dataset, which comprises 80 static and dynamic signs that were repeated five times by 40 signers. The signs include Arabic alphabet, numbers, and some daily use signs. To facilitate building an online sign recognition system, we introduce the inverse efficiency score to find a sufficient optimal number of successive frames for the recognition decision, in order to cope with a near real-time automatic ArSL system, where tradeoff between accuracy and speed is crucial to avoid delayed sign classification. For the dependent mode, best results were obtained for dynamic signs with an accuracy of 98.39%, and 88.89% for the static signs, and for the independent mode, we obtained for the dynamic signs an accuracy of 96.69%, and 86.34% for the static signs. When both the static and dynamic signs were mixed and the system trained with all the signs, accuracies of 89.62% and 88.09% were obtained in the signer dependent and signer independent modes respectively.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.