Proceedings of the 55th Annual Meeting of the Association For Computational Linguistics (Volume 1: Long Papers) 2017
DOI: 10.18653/v1/p17-1175
|View full text |Cite
|
Sign up to set email alerts
|

Doubly-Attentive Decoder for Multi-modal Neural Machine Translation

Abstract: We introduce a Multi-modal Neural Machine Translation model in which a doubly-attentive decoder naturally incorporates spatial visual features obtained using pre-trained convolutional neural networks, bridging the gap between image description and translation. Our decoder learns to attend to source-language words and parts of an image independently by means of two separate attention mechanisms as it generates words in the target language. We find that our model can efficiently exploit not just back-translated … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
96
0
2

Year Published

2017
2017
2021
2021

Publication Types

Select...
4
2
1

Relationship

2
5

Authors

Journals

citations
Cited by 131 publications
(99 citation statements)
references
References 34 publications
0
96
0
2
Order By: Relevance
“…First, Section 3.1 looks at NMT for e-commerce, describing important parts of a more extended study that is reported in detail in Calixto et al (2017b). The second use case (Section 3.2) is an evaluation performed by Iconic Translation Machines Ltd.…”
Section: Use Casesmentioning
confidence: 99%
See 1 more Smart Citation
“…First, Section 3.1 looks at NMT for e-commerce, describing important parts of a more extended study that is reported in detail in Calixto et al (2017b). The second use case (Section 3.2) is an evaluation performed by Iconic Translation Machines Ltd.…”
Section: Use Casesmentioning
confidence: 99%
“…MT Systems -Three different systems were compared in this experiment (1) a PBSMT baseline model built with the Moses SMT Toolkit (Koehn et al, 2007), (2) a text-only NMT model (NMT t ), and (3) a multi-modal NMT model (NMT m ), described in more detail in Calixto et al (2017b), which expands upon the text-only attentionbased model and introduces a visual component to incorporate local visual features.…”
Section: Nmt For E-commerce Product Listingmentioning
confidence: 99%
“…By using an ensemble of four different multimodal NMT models trained on the translated Multi30k training data, we were able to obtain translations comparable to or even better than those obtained with the strong multi-modal NMT model of Calixto et al (2017a), which is pretrained on large amounts of WMT data and uses local image features.…”
Section: Resultsmentioning
confidence: 96%
“…Table 3: Results for the best model of Calixto et al (2017a), which is pre-trained on the English-German WMT 2015 (Bojar et al, 2015), and different combinations of multi-modal models, all trained on the original M30k T training data only, evaluated on the M30k T 2016 test set.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation