Simultaneous Machine Translation with Visual Context

Çağlayan, Ozan; Ive, Julia; Haralampieva, Veneta; Madhyastha, Pranava; Barrault, Loïc; Specia, Lucia

doi:10.18653/v1/2020.emnlp-main.184

Cited by 15 publications

(16 citation statements)

References 34 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…While most methods employ the attention mechanism to learn to attend relevant regions in an image, the shortage of annotated data could impair the attention module (see Table 5 (b)). Some recent efforts Lin et al, 2020;Caglayan et al, 2020) address the issue by feeding models with preextracted visual objects instead of the whole image. However, these methods are easily affected by the quality of the extracted objects.…”

Section: Discussionmentioning

confidence: 99%

Good for Misconceived Reasons: An Empirical Revisiting on the Need for Visual Context in Multimodal Machine Translation

Wu¹,

Kong²,

Bi³

et al. 2021

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Confer

View full text Add to dashboard Cite

A neural multimodal machine translation (MMT) system is one that aims to perform better translation by extending conventional textonly translation models with multimodal information. Many recent studies report improvements when equipping their models with the multimodal module, despite the controversy of whether such improvements indeed come from the multimodal part. We revisit the contribution of multimodal information in MMT by devising two interpretable MMT models. To our surprise, although our models replicate similar gains as recently developed multimodalintegrated systems achieved, our models learn to ignore the multimodal information. Upon further investigation, we discover that the improvements achieved by the multimodal models over text-only counterparts are in fact results of the regularization effect. We report empirical findings that highlight the importance of MMT models' interpretability, and discuss how our findings will benefit future research.

show abstract

Section: Discussionmentioning

confidence: 99%

Good for Misconceived Reasons: An Empirical Revisiting on the Need for Visual Context in Multimodal Machine Translation

Wu¹,

Kong²,

Bi³

et al. 2021

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Confer

View full text Add to dashboard Cite

show abstract

“…Details on training are given in Appendix A. We use pysimt (Caglayan et al, 2020) with Py-Torch (Paszke et al, 2019) v1.4 for our experiments. 3…”

Section: Trainingmentioning

confidence: 99%

Exploring Supervised and Unsupervised Rewards in Machine Translation

Ive

Wang

Fomicheva

et al. 2021

Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

Self Cite

View full text Add to dashboard Cite

Reinforcement Learning (RL) is a powerful framework to address the discrepancy between loss functions used during training and the final evaluation metrics to be used at test time. When applied to neural Machine Translation (MT), it minimises the mismatch between the cross-entropy loss and non-differentiable evaluation metrics like BLEU. However, the suitability of these metrics as reward function at training time is questionable: they tend to be sparse and biased towards the specific words used in the reference texts. We propose to address this problem by making models less reliant on such metrics in two ways: (a) with an entropy-regularised RL method that does not only maximise a reward function but also explore the action space to avoid peaky distributions; (b) with a novel RL method that explores a dynamic unsupervised reward function to balance between exploration and exploitation. We base our proposals on the Soft Actor-Critic (SAC) framework, adapting the off-policy maximum entropy model for language generation applications such as MT. We demonstrate that SAC with BLEU reward tends to overfit less to the training data and performs better on out-of-domain data. We also show that our dynamic unsupervised reward can lead to better translation of ambiguous words.

show abstract

“…On the other hand, models with the fixed policy have much simpler architecture and lower latency compared to more complicated models with the adaptive policy. As a study utilizing additional information for SNMT, it has been shown that the image information related to the translated sentence contributes to performance improvement (Imankulova et al, 2020;Caglayan et al, 2020). Zoph and Knight (2016) first proposed MSNMT using multiple encoders for each source language and a single decoder for the target language.…”

Section: Related Workmentioning

confidence: 99%

Simultaneous Multi-Pivot Neural Machine Translation

Dabre¹,

Imankulova²,

Kaneko³

et al. 2021

Preprint

View full text Add to dashboard Cite

Parallel corpora are indispensable for training neural machine translation (NMT) models, and parallel corpora for most language pairs do not exist or are scarce. In such cases, pivot language NMT can be helpful where a pivot language is used such that there exist parallel corpora between the source and pivot and pivot and target languages. Naturally, the quality of pivot language translation is more inferior to what could be achieved with a direct parallel corpus of a reasonable size for that pair. In a real-time simultaneous translation setting, the quality of pivot language translation deteriorates even further given that the model has to output translations the moment a few source words become available. To solve this issue, we propose multi-pivot translation and apply it to a simultaneous translation setting involving pivot languages. Our approach involves simultaneously translating a source language into multiple pivots, which are then simultaneously translated together into the target language by leveraging multi-source NMT. Our experiments in a low-resource setting using the N-way parallel UN corpus for Arabic to English NMT via French and Spanish as pivots reveals that in a simultaneous pivot NMT setting, using two pivot languages can lead to an improvement of up to 5.8 BLEU.

show abstract

Simultaneous Machine Translation with Visual Context

Cited by 15 publications

References 34 publications

Good for Misconceived Reasons: An Empirical Revisiting on the Need for Visual Context in Multimodal Machine Translation

Good for Misconceived Reasons: An Empirical Revisiting on the Need for Visual Context in Multimodal Machine Translation

Exploring Supervised and Unsupervised Rewards in Machine Translation

Simultaneous Multi-Pivot Neural Machine Translation

Contact Info

Product

Resources

About