ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2020
DOI: 10.1109/icassp40776.2020.9054316
|View full text |Cite
|
Sign up to set email alerts
|

Key Action and Joint CTC-Attention based Sign Language Recognition

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
9

Relationship

0
9

Authors

Journals

citations
Cited by 15 publications
(5 citation statements)
references
References 8 publications
0
5
0
Order By: Relevance
“…The results are presented in Figures 11–13, respectively. For comparison purposes, we include methods such as [15] (Gao et al , 2021), [22] (Pu et al , 2018), [23] (Huang et al , 2018), [24] (Wang et al , 2018a), [25] (Venugopalan et al , 2015b), [26] (Luong et al , 2015), [27] (Chan et al , 2016), [28] (Bin et al , 2018), [29] (Camgoz et al , 2020), [30] (Li et al , 2020), [31] (Lafferty et al , 2001), [32] (Morency et al , 2007), [33] (Zhang et al , 2014), [34] (Venugopalan and Rohrbach, 2015a), [35] (Pan et al , 2016) and [36] (Guo et al , 2019). We observe that the proposed Savitar model performs better than the state-of-the-art with a large margin on the three datasets, indicating the superiority of our model.…”
Section: Methodsmentioning
confidence: 99%
“…The results are presented in Figures 11–13, respectively. For comparison purposes, we include methods such as [15] (Gao et al , 2021), [22] (Pu et al , 2018), [23] (Huang et al , 2018), [24] (Wang et al , 2018a), [25] (Venugopalan et al , 2015b), [26] (Luong et al , 2015), [27] (Chan et al , 2016), [28] (Bin et al , 2018), [29] (Camgoz et al , 2020), [30] (Li et al , 2020), [31] (Lafferty et al , 2001), [32] (Morency et al , 2007), [33] (Zhang et al , 2014), [34] (Venugopalan and Rohrbach, 2015a), [35] (Pan et al , 2016) and [36] (Guo et al , 2019). We observe that the proposed Savitar model performs better than the state-of-the-art with a large margin on the three datasets, indicating the superiority of our model.…”
Section: Methodsmentioning
confidence: 99%
“…Huang et al [22] introduced an attention mechanism into the encoding and decoding architecture to measure the influence of all inputs on the current decoding position. In addition, in order to effectively learn the multilevel information in sign language data, several works [23][24][25] adopted the hierarchical structure model to explore the information between learning levels. Zhou et al [8] proposed a Spatial-Temporal Multi-Cue (STMC) network to explore the effective complementarity of multimodal information.…”
Section: Related Workmentioning
confidence: 99%
“…When conducting experiments, the CSL was partitioned following previous work [25]. While the RWTH-Phoenix-Weather multi-signer 2014 dataset follows the settings adopted in a previous study [36], the dataset is divided into a training set, test set and development (dev) set.…”
Section: Datasetsmentioning
confidence: 99%
“…Furthermore, a hierarchical BLSTM with attention over sliding windows was used on the decoder to weigh the importance of the input frames. Li et al in [ 46 ], used a pyramid structure of BLSTMs in order to find key actions of the video representations, which were produced from the 2D-CNN. Moreover, an attention-based LSTM was used to align the input and output sequences and the whole network was trained jointly with Cross-Entropy and CTC losses.…”
Section: Sign Language Recognitionmentioning
confidence: 99%