Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2016
DOI: 10.18653/v1/p16-1227
|View full text |Cite
|
Sign up to set email alerts
|

Multimodal Pivots for Image Caption Translation

Abstract: We present an approach to improve statistical machine translation of image descriptions by multimodal pivots defined in visual space. The key idea is to perform image retrieval over a database of images that are captioned in the target language, and use the captions of the most similar images for crosslingual reranking of translation outputs. Our approach does not depend on the availability of large amounts of in-domain parallel data, but only relies on available large datasets of monolingually captioned image… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
76
0

Year Published

2017
2017
2021
2021

Publication Types

Select...
5
3
2

Relationship

0
10

Authors

Journals

citations
Cited by 86 publications
(79 citation statements)
references
References 27 publications
0
76
0
Order By: Relevance
“…1 The aim of this task is to use images in addition to source languages as inputs to improve the translation performance, hopefully relaxing ambiguity in alignment that cannot be solved by texts only. The feasibility of this approach has been demonstrated by some methods, such as visual-based reranking of SMT results (Hitschler and Riezler 2016). However, this task assumes that images are available as a part of a query in the testing phase, and thus the objective and setup are entirely different from ours.…”
Section: Computer Vision For Machine Translationmentioning
confidence: 99%
“…1 The aim of this task is to use images in addition to source languages as inputs to improve the translation performance, hopefully relaxing ambiguity in alignment that cannot be solved by texts only. The feasibility of this approach has been demonstrated by some methods, such as visual-based reranking of SMT results (Hitschler and Riezler 2016). However, this task assumes that images are available as a part of a query in the testing phase, and thus the objective and setup are entirely different from ours.…”
Section: Computer Vision For Machine Translationmentioning
confidence: 99%
“…Such resources currently exist with annotations in German [Elliott et al, 2016, Hitschler et al, 2016, Rajendran et al, 2016, Turkish [Unal et al, 2016], Chinese [Li et al, 2016], Japanese [Miyazaki andShimizu, 2016, Yoshikawa et al, 2017], Dutch [van Miltenburg et al, 2017], and French . Table 1 presents an overview of multilingual image description datasets.…”
Section: Multilingual Multimodal Resourcesmentioning
confidence: 99%
“…These datasets are constructed in English and are aimed at advancing research on the generation of image descriptions in English. Recent attempts have been made to incorporate multilinguality into both these largescale datasets, with the datasets being extended to other languages such as German and Japanese Hitschler et al, 2016;Miyazaki and Shimizu, 2016;Yoshikawa et al, 2017).…”
Section: Related Workmentioning
confidence: 99%