Sign Language Transformers: Joint End-to-End Sign Language Recognition and Translation

Camgöz, Necati Cihan; Koller, Oscar; Hadfield, Simon; Bowden, Richard

doi:10.1109/cvpr42600.2020.01004

Cited by 264 publications

(165 citation statements)

References 44 publications

Supporting

Mentioning

119

Contrasting

Order By: Relevance

“…However, this claim assumes that the ground truth gloss annotations give a full understanding of sign language, which ignores the information bottleneck in glosses. Camgoz et al (2020) hypothesizes that it is therefore possible to surpass G2T performance without using GT glosses, which we confirm in this section.…”

Section: German Sign2gloss2text (S2g2t)supporting

confidence: 83%

“…Also, we report an improvement of over 5 BLEU-4 on the state-of-the-art. A single Transformer also gives an improvement of over 4 BLEU-4 more than the state-of-the-art, which shows the advantage of Transformers for SLT, as shown also in Camgoz et al (2020). We also use 5 of the best models from our experiments on ASLG-PC12 in an ensemble.…”

Section: Ensemble Decodingmentioning

confidence: 71%

“…Both our STMC-Transformer and STMC-RNN also outperform Camgoz et al (2020)'s model. Their best model jointly train Transformers for recognition and translation, however it obtains 24.49 WER on recognition whereas STMC obtains a better WER of 21.0, which suggests their model may be weaker in processing the videos.…”

Section: Transformermentioning

confidence: 82%

“…Subsequent works on this dataset (Orbay and Akarun, 2020;Zhou et al, 2020) all focus on improving the CSLR component in SLT. A contemporaneous paper (Camgoz et al, 2020) also obtains encouraging results with multi-task Transformers for both tokenization and translation, however their CSLR performance is sub-optimal, with a higher Word Error Rate than baseline models.…”

Section: Sign Language Translationmentioning

confidence: 98%

“…These models give a BLEU-4 on testing between 22.92 and 23.41 individually. Table 7 gives a performance comparison on PHOENIX-Weather 2014T of the recurrent seq2seq model by Camgoz et al (2018), Transformer trained concurrently by Camgoz et al (2020), our single model, and ensemble model. We also provide the scores on the gloss annotations to illustrate the difficulty of this task.…”

Section: Ensemble Decodingmentioning

confidence: 99%

See 4 more Smart Citations

Better Sign Language Translation with STMC-Transformer

Yin

Read

2020

Proceedings of the 28th International Conference on Computational Linguistics

View full text Add to dashboard Cite

Sign Language Translation (SLT) first uses a Sign Language Recognition (SLR) system to extract sign language glosses from videos. Then, a translation system generates spoken language translations from the sign language glosses. This paper focuses on the translation system and introduces the STMC-Transformer which improves on the current state-of-the-art by over 5 and 7 BLEU respectively on gloss-to-text and video-to-text translation of the PHOENIX-Weather 2014T dataset. On the ASLG-PC12 corpus, we report an increase of over 16 BLEU.We also demonstrate the problem in current methods that rely on gloss supervision. The videoto-text translation of our STMC-Transformer outperforms translation of GT glosses. This contradicts previous claims that GT gloss translation acts as an upper bound for SLT performance and reveals that glosses are an inefficient representation of sign language. For future SLT research, we therefore suggest an end-to-end training of the recognition and translation models, or using a different sign language annotation scheme. *

show abstract

Section: German Sign2gloss2text (S2g2t)supporting

confidence: 83%

Section: Ensemble Decodingmentioning

confidence: 71%

Section: Transformermentioning

confidence: 82%

Section: Sign Language Translationmentioning

confidence: 98%

Section: Ensemble Decodingmentioning

confidence: 99%

See 3 more Smart Citations

Better Sign Language Translation with STMC-Transformer

Yin

Read

2020

Proceedings of the 28th International Conference on Computational Linguistics

View full text Add to dashboard Cite

show abstract

Adversarial autoencoder for continuous sign language recognition

Kamal,

Chen,

2024

Concurrency and Computation

View full text Add to dashboard Cite

SummarySign language serves as a vital communication medium for the deaf community, encompassing a diverse array of signs conveyed through distinct hand shapes along with non‐manual gestures like facial expressions and body movements. Accurate recognition of sign language is crucial for bridging the communication gap between deaf and hearing individuals, yet the scarcity of large‐scale datasets poses a significant challenge in developing robust recognition technologies. Existing works address this challenge by employing various strategies, such as enhancing visual modules, incorporating pretrained visual models, and leveraging multiple modalities to improve performance and mitigate overfitting. However, the exploration of the contextual module, responsible for modeling long‐term dependencies, remains limited. This work introduces an Adversarial Autoencoder for Continuous Sign Language Recognition, AA‐CSLR, to address the constraints imposed by limited data availability, leveraging the capabilities of generative models. The integration of pretrained knowledge, coupled with cross‐modal alignment, enhances the representation of sign language by effectively aligning visual and textual features. Through extensive experiments on publicly available datasets (PHOENIX‐2014, PHOENIX‐2014T, and CSL‐Daily), we demonstrate the effectiveness of our proposed method in achieving competitive performance in continuous sign language recognition.

show abstract