2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019
DOI: 10.1109/cvpr.2019.01073
|View full text |Cite
|
Sign up to set email alerts
|

Unsupervised Multi-Modal Neural Machine Translation

Abstract: Unsupervised neural machine translation (UNMT) has recently achieved remarkable results [19] with only large monolingual corpora in each language. However, the uncertainty of associating target with source sentences makes UNMT theoretically an ill-posed problem. This work investigates the possibility of utilizing images for disambiguation to improve the performance of UNMT. Our assumption is intuitively based on the invariant property of image, i.e., the description of the same visual content by different lang… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
32
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
3
3
3

Relationship

1
8

Authors

Journals

citations
Cited by 50 publications
(36 citation statements)
references
References 23 publications
0
32
0
Order By: Relevance
“…Elliott and Kádár (2017) and Helcl et al (2018) investigate visually grounded representations to improve supervised multimodal machine translation, and ignore input images at test time. Using reinforcement learning, Chen et al (2018) jointly optimizes a captioner and a neural machine translator to achieve unsupervised multimodal machine translation, while Su et al (2019) and Huang et al (2020) explore transformers (Vaswani et al, 2017) to construct a text encoder-decoder for the same goal. Our work is different from referred multimodal machine translation works since our work starts from multilingual image captioning and is applied to machine translation, while some of the other methods start from a multimodal machine translation and are applied to machine translation, however building models that take advantage from these two tasks is a possible avenue for future work.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Elliott and Kádár (2017) and Helcl et al (2018) investigate visually grounded representations to improve supervised multimodal machine translation, and ignore input images at test time. Using reinforcement learning, Chen et al (2018) jointly optimizes a captioner and a neural machine translator to achieve unsupervised multimodal machine translation, while Su et al (2019) and Huang et al (2020) explore transformers (Vaswani et al, 2017) to construct a text encoder-decoder for the same goal. Our work is different from referred multimodal machine translation works since our work starts from multilingual image captioning and is applied to machine translation, while some of the other methods start from a multimodal machine translation and are applied to machine translation, however building models that take advantage from these two tasks is a possible avenue for future work.…”
Section: Related Workmentioning
confidence: 99%
“…Many of previous methods rely on pre-training on external data for either captioning or machine translation and finetune models using task 1 data from Multi30k, while we rely on only the provided task 2 data from Multi30k. For example, Su et al (2019) and Huang et al (2020) both utilize WMT News Crawl datasets to pre-train machine translation models.…”
Section: Related Workmentioning
confidence: 99%
“…For example, Calixto, Rios, and Aziz (2019) sets a latent variable as a stochastic embedding which is used in the target-language decoder and to predict visual features. Chen, Jin, and Fu (2019) present a progressive learning approach for image pivoted zero-resource machine translation and Su et al (2019) investigate the possibility of utilizing images for disambiguation to improve the performance of unsupervised machine translation.…”
Section: Related Workmentioning
confidence: 99%
“…Using visual content for unsupervised MT Su et al, 2019) is a promising solution for pivoting and alignment based on its availability and feasibility. Abundant multimodal content in various languages are available online (e.g.…”
Section: Introductionmentioning
confidence: 99%