2022
DOI: 10.1109/access.2022.3148132
|View full text |Cite
|
Sign up to set email alerts
|

American Sign Language Words Recognition Using Spatio-Temporal Prosodic and Angle Features: A Sequential Learning Approach

Abstract: Most of the available American Sign Language (ASL) words share similar characteristics. These characteristics are usually during sign trajectory which yields similarity issues and hinders ubiquitous application. However, recognition of similar ASL words confused translation algorithms, which lead to misclassification. In this paper, based on fast fisher vector (FFV) and bi-directional Long-Short Term memory (Bi-LSTM) method, a large database of dynamic sign words recognition algorithm called bidirectional long… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
30
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
7

Relationship

2
5

Authors

Journals

citations
Cited by 37 publications
(30 citation statements)
references
References 60 publications
0
30
0
Order By: Relevance
“…The second phase of the BiRNN layers is trained to learn output of the previous layers to be initial state of first layers and yields output vector ], and it is defined as: , where n . Finally, BiRNN extraction layers can be written uniformly as [ 43 ]: where dominant and non-dominant hand index is denoted as n , and ( ) and denote forward and backward pass hidden state vectors, respectively. In Equation (26), the extraction layers of BiRNN not only give the relationship of video input features vector but also correlate to state of prior sequence.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…The second phase of the BiRNN layers is trained to learn output of the previous layers to be initial state of first layers and yields output vector ], and it is defined as: , where n . Finally, BiRNN extraction layers can be written uniformly as [ 43 ]: where dominant and non-dominant hand index is denoted as n , and ( ) and denote forward and backward pass hidden state vectors, respectively. In Equation (26), the extraction layers of BiRNN not only give the relationship of video input features vector but also correlate to state of prior sequence.…”
Section: Methodsmentioning
confidence: 99%
“…Multi-stacked BiLSTM is trained to obtain output probability vectors for all of its corresponding input vectors, predicted word classes, and confusion matrices. Multi-stacked layers are initialized with weight of extracted features, as follows [ 43 ]: …”
Section: Methodsmentioning
confidence: 99%
“…Sign language is a medium of communication for hearing-impaired people, a group which includes 466 million people worldwide [1], and it is expressed using the fingers and hands. American Sign Language is the most populous among its peers, and thus consists of over ten thousand word gestures [2]. Moreover, 65% of ASL gestures represent sign words during a full conversation [3].…”
Section: Introductionmentioning
confidence: 99%
“…In addition, these features are used with a forehand view, which leads to misclassification; therefore, it is difficult to apply this method with a backhand view. The authors in [2,3,6,11,12] extracted hand features using a backhand view approach, which led to outstanding performance; however, these studies may fail to recognize words with a similar shape, rotation, and movement (SRM words). To address this problem with the existing works of [2,3,6,11,12], in this paper we propose the spatial-temporal body parts and hand relationship patterns (ST-BHR patterns) as the main feature of analysis for 72 isolated signed words in the SRM group [13] based on the backhand approach.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation