Proceedings of the Second Conference on Machine Translation 2017
DOI: 10.18653/v1/w17-4752
|View full text |Cite
|
Sign up to set email alerts
|

Sheffield MultiMT: Using Object Posterior Predictions for Multimodal Machine Translation

Abstract: This paper describes the University of Sheffield's submission to the WMT17 Multimodal Machine Translation shared task. We participated in Task 1 to develop an MT system to translate an image description from English to German and French, given its corresponding image. Our proposed systems are based on the state-of-the-art Neural Machine Translation approach. We investigate the effect of replacing the commonly-used image embeddings with an estimated posterior probability prediction for 1,000 object categories i… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
6
1
1

Relationship

3
5

Authors

Journals

citations
Cited by 15 publications
(5 citation statements)
references
References 20 publications
0
5
0
Order By: Relevance
“…The image representation is integrated into the MT models by initialising the encoder or decoder (Elliott et al, 2015;Caglayan et al, 2017;Madhyastha et al, 2017); element-wise multiplication with the source word annotations (Caglayan et al, 2017); or projecting the image representation and encoder context to a common space to initialise the decoder . Elliott and Kádár (2017) and Helcl et al (2018) instead model the source sentence and reconstruct the image representation jointly via multi-task learning.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…The image representation is integrated into the MT models by initialising the encoder or decoder (Elliott et al, 2015;Caglayan et al, 2017;Madhyastha et al, 2017); element-wise multiplication with the source word annotations (Caglayan et al, 2017); or projecting the image representation and encoder context to a common space to initialise the decoder . Elliott and Kádár (2017) and Helcl et al (2018) instead model the source sentence and reconstruct the image representation jointly via multi-task learning.…”
Section: Related Workmentioning
confidence: 99%
“…Initial approaches use RNN-based sequence to sequence models (Bahdanau et al, 2015) enhanced with a single, global image vector, extracted as one of the layers of a CNN trained for object classification (He et al, 2016), often the penultimate or final layer. The image representation is integrated into the MT models by initialising the encoder or decoder (Elliott et al, 2015;Caglayan et al, 2017;Madhyastha et al, 2017); element-wise multiplication with the source word annotations (Caglayan et al, 2017); or projecting the image representation and encoder context to a common space to initialise the decoder . Elliott and Kádár (2017) and Helcl et al (2018) instead model the source sentence and reconstruct the image representation jointly via multi-task learning.…”
Section: Related Workmentioning
confidence: 99%
“…Our proposed model, even though it is textual, produced competitive results with other multimodal models. The mixture-of-experts model outperformed several multimodal models, including another WMT submission [29]- [32]. Even in the out-of-domain dataset of COCO 2017, the mixture-of-experts model also performed reasonably well with a 28.0 BLEU score.…”
Section: Model Specification and Implementation Detailsmentioning
confidence: 89%
“…Later initialisation variants are applied to attentive NMTs: Calixto et al (2016) and Libovický et al (2016) experiment with recurrent decoder initialisation while Ma et al (2017) initialise both the encoder and the decoder, with features from a state-of-the-art ResNet (He et al 2016). Madhyastha et al (2017) explore the expressiveness of the posterior probability vector as a visual representation, rather than the pooled features from the penultimate layer of a CNN.…”
Section: Sequence-to-sequence Grounding With Pooled Featuresmentioning
confidence: 99%