Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers 2016
DOI: 10.18653/v1/w16-2346
|View full text |Cite
|
Sign up to set email alerts
|

A Shared Task on Multimodal Machine Translation and Crosslingual Image Description

Abstract: This paper introduces and summarises the findings of a new shared task at the intersection of Natural Language Processing and Computer Vision: the generation of image descriptions in a target language, given an image and/or one or more descriptions in a different (source) language. This challenge was organised along with the Conference on Machine Translation (WMT16), and called for system submissions for two task variants: (i) a translation task, in which a source language image description needs to be transla… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
105
1

Year Published

2017
2017
2024
2024

Publication Types

Select...
3
3
3

Relationship

1
8

Authors

Journals

citations
Cited by 173 publications
(106 citation statements)
references
References 25 publications
0
105
1
Order By: Relevance
“…At first dominated by statistical methods combining count-based translation and language models [33], the current paradigm relies upon deep neural network models [34]. New ideas continue to be introduced, including models which take advantage of shared visual context [35], but the majority of MT research has focused on the text-to-text case. Recent work has moved beyond that paradigm by implementing translation between speech audio in the source language and written text in the target language [36,37,38].…”
Section: Relation To Prior Workmentioning
confidence: 99%
“…At first dominated by statistical methods combining count-based translation and language models [33], the current paradigm relies upon deep neural network models [34]. New ideas continue to be introduced, including models which take advantage of shared visual context [35], but the majority of MT research has focused on the text-to-text case. Recent work has moved beyond that paradigm by implementing translation between speech audio in the source language and written text in the target language [36,37,38].…”
Section: Relation To Prior Workmentioning
confidence: 99%
“…Multimodal machine translation (MMT) has been the subject of two largescale Shared Task evaluations at the Conference on Machine Translation [Specia et al, 2016, which we refer to as MMT16 and MMT17. These shared tasks have focused on generating descriptions of images in non-English languages, by either translating parallel text or crosslingual description using independently collected sentences.…”
Section: Evaluating Multilingual Multimodal Modelsmentioning
confidence: 99%
“…no re-training or finetuning (using the post-edited development set) was performed; only the goldstandard data was (marginally) different due to the post-edits. Table 6 shows the relative difference in system performance when evaluated using the post-edited references as compared to the original ranking [Specia et al, 2016]. The differences between performance on the two test sets are nonexistent or marginal and do not lead to any changes in the overall ranking of the systems.…”
Section: (B) English Description Inaccuratementioning
confidence: 99%
“…processing (NLP) tasks as well, such as image caption [26] and some task-specific translation-sign language translation [5]. However, [23] demonstrates that most multimodal translation algorithms are not significantly better than an off-the-shelf text-only machine translation (MT) model for the Multi30K dataset [11]. There remains an open question about how translation models should take advantage of visual context, because from the perspective of information theory, the mutual information of two random variables I(X, Y ) will always be no greater than I(X; Y, Z), due to the following fact.…”
Section: Introductionmentioning
confidence: 99%