“…Human pose estimation systems are used to extract features in seven works (37%) [38,34,9,49,37,85]. The estimated poses can be the sole inputs to the translation model [38,37,49], or augment other spatial or spatiotemporal features [34,9,85].…”
Section: Sign Language Representationsmentioning
confidence: 99%
“…As one paper may discuss several tasks, the total count is higher than the amount of papers. 1, 57, 51] and transformers also in 12 papers [80,82,46,84,9,11,38,49,37,44,18,81]. Within the RNN based models, several attention schemes are used: no attention, Luong attention [42] and Bahdanau attention [3].…”
Section: Sign Language Translation Modelsmentioning
confidence: 99%
“…Sign2Gloss2Text models are proposed in 7 papers (22%) [64,23,10,39,80,11,84]. Sign2(Gloss+Text) models are found 3 times (11%) within the reviewed papers [11,85,18] and Sign2Text models 12 times (38%) [33,10,38,83,34,9,49,37,11,57,82,56,84].…”
Section: Tasksmentioning
confidence: 99%
“…Note that the largest dataset in terms of number of parallel sentences, ASLG-PC12, contains 827 thousand training sentences. For MT between spoken languages, datasets typically contain several millions of sentences, 12 papers (37.5%) use custom datasets that are not publicly available [38,33,34,37,43,62,45,56,41,7,57,44], limiting further analysis of their results as they cannot be compared directly to other papers.…”
Automatic translation from signed to spoken languages is an interdisciplinary research domain, lying on the intersection of computer vision, machine translation and linguistics. Nevertheless, research in this domain is performed mostly by computer scientists in isolation. As the domain is becoming increasingly popular -the majority of scientific papers on the topic of sign language translation have been published in the past three years -we provide an overview of the state of the art as well as some required background in the different related disciplines. We give a high-level introduction to sign language linguistics and machine translation to illustrate the requirements of automatic sign language translation. We present a systematic literature review to illustrate the state of the art in the domain and then, harking back to the requirements, lay out several challenges for future research. We find that significant advances have been made on the shoulders of spoken language machine translation research. However, current approaches are often not linguistically motivated or are not adapted to the different input modality of sign languages. We explore challenges related to the representation of sign language data, the collection of datasets, the need for interdisciplinary research and requirements for moving beyond research, towards applications. Based on our findings, we advocate for interdisciplinary research and to base future research on linguistic analysis of sign languages. Furthermore, the inclusion of deaf and hearing end users of sign language translation applications in use case identification, data collection and evaluation is of the utmost importance in the creation of useful sign language translation models. We recommend iterative, human-in-the-loop, design and development of sign language translation models.
“…Human pose estimation systems are used to extract features in seven works (37%) [38,34,9,49,37,85]. The estimated poses can be the sole inputs to the translation model [38,37,49], or augment other spatial or spatiotemporal features [34,9,85].…”
Section: Sign Language Representationsmentioning
confidence: 99%
“…As one paper may discuss several tasks, the total count is higher than the amount of papers. 1, 57, 51] and transformers also in 12 papers [80,82,46,84,9,11,38,49,37,44,18,81]. Within the RNN based models, several attention schemes are used: no attention, Luong attention [42] and Bahdanau attention [3].…”
Section: Sign Language Translation Modelsmentioning
confidence: 99%
“…Sign2Gloss2Text models are proposed in 7 papers (22%) [64,23,10,39,80,11,84]. Sign2(Gloss+Text) models are found 3 times (11%) within the reviewed papers [11,85,18] and Sign2Text models 12 times (38%) [33,10,38,83,34,9,49,37,11,57,82,56,84].…”
Section: Tasksmentioning
confidence: 99%
“…Note that the largest dataset in terms of number of parallel sentences, ASLG-PC12, contains 827 thousand training sentences. For MT between spoken languages, datasets typically contain several millions of sentences, 12 papers (37.5%) use custom datasets that are not publicly available [38,33,34,37,43,62,45,56,41,7,57,44], limiting further analysis of their results as they cannot be compared directly to other papers.…”
Automatic translation from signed to spoken languages is an interdisciplinary research domain, lying on the intersection of computer vision, machine translation and linguistics. Nevertheless, research in this domain is performed mostly by computer scientists in isolation. As the domain is becoming increasingly popular -the majority of scientific papers on the topic of sign language translation have been published in the past three years -we provide an overview of the state of the art as well as some required background in the different related disciplines. We give a high-level introduction to sign language linguistics and machine translation to illustrate the requirements of automatic sign language translation. We present a systematic literature review to illustrate the state of the art in the domain and then, harking back to the requirements, lay out several challenges for future research. We find that significant advances have been made on the shoulders of spoken language machine translation research. However, current approaches are often not linguistically motivated or are not adapted to the different input modality of sign languages. We explore challenges related to the representation of sign language data, the collection of datasets, the need for interdisciplinary research and requirements for moving beyond research, towards applications. Based on our findings, we advocate for interdisciplinary research and to base future research on linguistic analysis of sign languages. Furthermore, the inclusion of deaf and hearing end users of sign language translation applications in use case identification, data collection and evaluation is of the utmost importance in the creation of useful sign language translation models. We recommend iterative, human-in-the-loop, design and development of sign language translation models.
“…In their latest work, Camgoz et al in [ 97 ], adopted additional modalities and a cross-modal attention to synchronize the different streams and model both inter- and intra-contextual information. Kim et al in [ 98 ], used a deep neural network for human keypoint extraction that were fed to a transformer encoder-decoder network, while the keypoints were normalized based on the neck location. A comparison of existing methods for SLT that are evaluated on the Phoenix-2014-T dataset, is shown in Table 4 .…”
AI technologies can play an important role in breaking down the communication barriers of deaf or hearing-impaired people with other communities, contributing significantly to their social inclusion. Recent advances in both sensing technologies and AI algorithms have paved the way for the development of various applications aiming at fulfilling the needs of deaf and hearing-impaired communities. To this end, this survey aims to provide a comprehensive review of state-of-the-art methods in sign language capturing, recognition, translation and representation, pinpointing their advantages and limitations. In addition, the survey presents a number of applications, while it discusses the main challenges in the field of sign language technologies. Future research direction are also proposed in order to assist prospective researchers towards further advancing the field.
Automatic translation from signed to spoken languages is an interdisciplinary research domain on the intersection of computer vision, machine translation (MT), and linguistics. While the domain is growing in terms of popularity—the majority of scientific papers on sign language (SL) translation have been published in the past five years—research in this domain is performed mostly by computer scientists in isolation. This article presents an extensive and cross-domain overview of the work on SL translation. We first give a high level introduction to SL linguistics and MT to illustrate the requirements of automatic SL translation. Then, we present a systematic literature review of the state of the art in the domain. Finally, we outline important challenges for future research. We find that significant advances have been made on the shoulders of spoken language MT research. However, current approaches often lack linguistic motivation or are not adapted to the different characteristics of SLs. We explore challenges related to the representation of SL data, the collection of datasets and the evaluation of SL translation models. We advocate for interdisciplinary research and for grounding future research in linguistic analysis of SLs. Furthermore, the inclusion of deaf and hearing end users of SL translation applications in use case identification, data collection, and evaluation, is of utmost importance in the creation of useful SL translation models.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.