Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) 2020
DOI: 10.18653/v1/2020.emnlp-main.184
|View full text |Cite
|
Sign up to set email alerts
|

Simultaneous Machine Translation with Visual Context

Abstract: Simultaneous machine translation (SiMT) aims to translate a continuous input text stream into another language with the lowest latency and highest quality possible. The translation thus has to start with an incomplete source text, which is read progressively, creating the need for anticipation. In this paper, we seek to understand whether the addition of visual information can compensate for the missing source context. To this end, we analyse the impact of different multimodal approaches and visual features on… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
15
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
4

Relationship

3
5

Authors

Journals

citations
Cited by 15 publications
(16 citation statements)
references
References 34 publications
1
15
0
Order By: Relevance
“…While most methods employ the attention mechanism to learn to attend relevant regions in an image, the shortage of annotated data could impair the attention module (see Table 5 (b)). Some recent efforts Lin et al, 2020;Caglayan et al, 2020) address the issue by feeding models with preextracted visual objects instead of the whole image. However, these methods are easily affected by the quality of the extracted objects.…”
Section: Discussionmentioning
confidence: 99%
“…While most methods employ the attention mechanism to learn to attend relevant regions in an image, the shortage of annotated data could impair the attention module (see Table 5 (b)). Some recent efforts Lin et al, 2020;Caglayan et al, 2020) address the issue by feeding models with preextracted visual objects instead of the whole image. However, these methods are easily affected by the quality of the extracted objects.…”
Section: Discussionmentioning
confidence: 99%
“…Details on training are given in Appendix A. We use pysimt (Caglayan et al, 2020) with Py-Torch (Paszke et al, 2019) v1.4 for our experiments. 3…”
Section: Trainingmentioning
confidence: 99%
“…On the other hand, models with the fixed policy have much simpler architecture and lower latency compared to more complicated models with the adaptive policy. As a study utilizing additional information for SNMT, it has been shown that the image information related to the translated sentence contributes to performance improvement (Imankulova et al, 2020;Caglayan et al, 2020). Zoph and Knight (2016) first proposed MSNMT using multiple encoders for each source language and a single decoder for the target language.…”
Section: Related Workmentioning
confidence: 99%