TSPNet-HF: A Hand/Face TSPNet Method for Sign Language Translation

Miranda, Péricles; Casadei, Vitor; Silva, Emely; Silva, Jayne; Alves, Manoel; Severo, Marianna; Freitas, José Alberto de Souza

doi:10.1007/978-3-031-22419-5_26

Cited by 1 publication

(5 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Ten papers use 3D CNNs for feature extraction [35,66,67,76,82,83,86,88,96,99]. These networks are able to extract spatio-temporal features, leveraging the temporal relations between neighboring frames in video data.…”

Section: Extraction Methodsmentioning

confidence: 99%

“…A simple approach to feature extraction is to consider full video frames as inputs. Performing further pre-processing of the visual information to target hands, face and pose information separately (referred to as a multi-cue approach) improves the performance of SLT models [36,59,65,75,80,86,96]. Zheng et al [75] show through qualitative analysis that adding facial feature extraction improves translation accuracy in utterances where facial expressions are used.…”

Section: Multi-cue Approachesmentioning

confidence: 99%

“…Dey et al [96] observe improvements in BLEU scores when adding lip reading as an input channel. By adding face crops as an additional channel, Miranda et al [86] improve the performance of the TSPNet architecture [66].…”

Section: Multi-cue Approachesmentioning

confidence: 99%

“…The popularity of the RWTH-PHOENIX-Weather 2014T dataset facilitates the comparison of different SLT models on this dataset. We compare models based on their BLEU-4 score as this is the only metric consistently reported on in all of the papers using RWTH-PHOENIX-Weather 2014T (except [86]). An overview of Gloss2Text models is shown in Table 3.…”

Section: The Rwth-phoenix-weather 2014t Benchmarkmentioning

confidence: 99%

“…Zheng et al [75] use an additional channel of facial information for Sign2Text and obtain an increase of 1.6 BLEU-4 compared to their baseline. Miranda et al [86] augment TSPNet [66] with face crops, improving the performance of the network.…”

Section: Sign Language Representationsmentioning

confidence: 99%

See 4 more Smart Citations

Machine translation from signed to spoken languages: state of the art and challenges

Coster

Shterionov

Herreweghe

et al. 2023

Univ Access Inf Soc

View full text Add to dashboard Cite

Automatic translation from signed to spoken languages is an interdisciplinary research domain on the intersection of computer vision, machine translation (MT), and linguistics. While the domain is growing in terms of popularity—the majority of scientific papers on sign language (SL) translation have been published in the past five years—research in this domain is performed mostly by computer scientists in isolation. This article presents an extensive and cross-domain overview of the work on SL translation. We first give a high level introduction to SL linguistics and MT to illustrate the requirements of automatic SL translation. Then, we present a systematic literature review of the state of the art in the domain. Finally, we outline important challenges for future research. We find that significant advances have been made on the shoulders of spoken language MT research. However, current approaches often lack linguistic motivation or are not adapted to the different characteristics of SLs. We explore challenges related to the representation of SL data, the collection of datasets and the evaluation of SL translation models. We advocate for interdisciplinary research and for grounding future research in linguistic analysis of SLs. Furthermore, the inclusion of deaf and hearing end users of SL translation applications in use case identification, data collection, and evaluation, is of utmost importance in the creation of useful SL translation models.

show abstract