Proceedings of the 5th Workshop on Vision and Language 2016
DOI: 10.18653/v1/w16-3210
|View full text |Cite
|
Sign up to set email alerts
|

Multi30K: Multilingual English-German Image Descriptions

Abstract: We introduce the Multi30K dataset to stimulate multilingual multimodal research. Recent advances in image description have been demonstrated on Englishlanguage datasets almost exclusively, but image description should not be limited to English. This dataset extends the Flickr30K dataset with i) German translations created by professional translators over a subset of the English descriptions, and ii) German descriptions crowdsourced independently of the original English descriptions. We describe the data and ou… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

2
279
0

Year Published

2016
2016
2023
2023

Publication Types

Select...
4
3
2

Relationship

1
8

Authors

Journals

citations
Cited by 320 publications
(281 citation statements)
references
References 12 publications
2
279
0
Order By: Relevance
“…We extend our experiments to the Multi30K data set, which is built on the Flickr30K data set (Young et al, 2014) and consists of English, German (Elliott et al, 2016), and French (Elliott et al, 2017) captions. For Multi30K, there are 29,000 images in the training set, 1,014 in the development set and 1,000 in the test set.…”
Section: Extension To Multiple Languagesmentioning
confidence: 99%
“…We extend our experiments to the Multi30K data set, which is built on the Flickr30K data set (Young et al, 2014) and consists of English, German (Elliott et al, 2016), and French (Elliott et al, 2017) captions. For Multi30K, there are 29,000 images in the training set, 1,014 in the development set and 1,000 in the test set.…”
Section: Extension To Multiple Languagesmentioning
confidence: 99%
“…With this goal, the dataset also serves in WAT 2019 8 shared task on multi-modal translation. 9 We illustrated that the text-only information in the surrounding words could be sufficient for the disambiguation. One interesting research direction would be thus to ignore all the surrounding words and simply ask: given the image, what is the correct Hindi translation of this ambiguous English word.…”
Section: Discussionmentioning
confidence: 90%
“…for resolving ambiguity due to different senses of words in different contexts. One of the starting points is "Flickr30k" [9], a multilingual (English-German, English-French, and English-Czech) shared task based on multimodal translation was part of WMT 2018 [10]. [11] proposed a multimodal NMT system using image feature for Hindi-English language pair.…”
Section: Related Workmentioning
confidence: 99%
“…As test data, set of 1,000 tuples containing an English description and its corresponding image was provided. More details about the shared task data can be found in (Elliott et al, 2016).…”
Section: Datamentioning
confidence: 99%