2017 IEEE International Conference on Computer Vision Workshops (ICCVW) 2017
DOI: 10.1109/iccvw.2017.365
|View full text |Cite
|
Sign up to set email alerts
|

Gesture and Sign Language Recognition with Temporal Residual Networks

Abstract: Gesture and sign language recognition in a continuous video stream is a challenging task, especially with a large vocabulary. In this work, we approach this as a framewise classification problem. We tackle it using temporal convolutions and recent advances in the deep learning field like residual networks, batch normalization and exponential linear units (ELUs). The models are evaluated on three different

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
32
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 50 publications
(32 citation statements)
references
References 17 publications
0
32
0
Order By: Relevance
“…CNNs have also been used in attempts to resolve the challenging task of gesture and sign language recognition in a constant video stream. For instance, Pigou et al [126] used a deep learning approach and temporal convolutions to address this problem. The CNN model featured certain improvements that made it easier to conduct the classification process.…”
Section: ) Deep Learning Techniquesmentioning
confidence: 99%
See 2 more Smart Citations
“…CNNs have also been used in attempts to resolve the challenging task of gesture and sign language recognition in a constant video stream. For instance, Pigou et al [126] used a deep learning approach and temporal convolutions to address this problem. The CNN model featured certain improvements that made it easier to conduct the classification process.…”
Section: ) Deep Learning Techniquesmentioning
confidence: 99%
“…However, this approach proved to be extremely challenging because the animations were difficult to work with after processing. While exploring the challenges of continuous translation, Pigou et al [126] observed that deep residual networks can be used to learn patterns in continuous videos containing gestures and signs. The use of deep residual networks can minimize the need for preprocessing.…”
Section: A Slr Continuous Modelsmentioning
confidence: 99%
See 1 more Smart Citation
“…Although there has been much research related to technologies for the deaf or hard of hearing (HoH) over the past three decades, much of this work has focused on the translation of sign language into voice or text using camera-based or wearable devices. Although sensor augmented gloves [1]- [3] have been reported to typically yield higher gesture recognition rates than camera-based systems [4]- [6], they cannot capture the intricacies of sign languages presented through head and body movements. In contrast, video can capture facial expressions; but require adequate light and a direct line-of-sight to be effective.…”
Section: Introductionmentioning
confidence: 99%
“…Because it is much easier to recruit hearing participants than deaf participants, many studies on ASL recognition (e.g. [6], [11]- [13]) have used imitation signing data, despite its differences from native ASL data.…”
Section: Introductionmentioning
confidence: 99%