Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1) 2019
DOI: 10.18653/v1/w19-5319
|View full text |Cite
|
Sign up to set email alerts
|

Evaluating the Supervised and Zero-shot Performance of Multi-lingual Translation Models

Abstract: We study several methods for full or partial sharing of the decoder parameters of multilingual NMT models. We evaluate both fully supervised and zero-shot translation performance in 110 unique translation directions using only the WMT 2019 shared task parallel datasets for training. We use additional test sets and re-purpose evaluation methods recently used for unsupervised MT in order to evaluate zero-shot translation performance for language pairs where no gold-standard parallel data is available. To our kno… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
12
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 11 publications
(12 citation statements)
references
References 36 publications
(43 reference statements)
0
12
0
Order By: Relevance
“…Pre-trained embedding is trained on monolingual data for 5 iterations and used as an initialization for the RNN model. (Hokamp et al, 2019) The Aylien research team built a Multilingual NMT system which is trained on all WMT2019 language pairs in all directions, then fine-tuned for a small number of iterations on Gujarati-English data only, including some self-backtranslated data.…”
Section: Apprentice-c (Li and Specia 2019)mentioning
confidence: 99%
See 1 more Smart Citation
“…Pre-trained embedding is trained on monolingual data for 5 iterations and used as an initialization for the RNN model. (Hokamp et al, 2019) The Aylien research team built a Multilingual NMT system which is trained on all WMT2019 language pairs in all directions, then fine-tuned for a small number of iterations on Gujarati-English data only, including some self-backtranslated data.…”
Section: Apprentice-c (Li and Specia 2019)mentioning
confidence: 99%
“…Air Force Research Laboratory ) APERTIUM-FIN-ENG Apertium (Pirinen, 2019) APPRENTICE-C Apprentice (Li and Specia, 2019) AYLIEN_MULTILINGUAL Aylien Ltd. (Hokamp et al, 2019) BAIDU Baidu (Sun et al, 2019) BTRANS (no associated paper)…”
Section: Afrlmentioning
confidence: 99%
“…By minimizing the diversity of representations, the decoder's task is simplified and it becomes better at language generation. The choice of a single encoder for all languages is also promoted by Hokamp et al [64], who opt for language-specific decoders. Murthy et al [101] pointed out that the sentence representations generated by the encoder are dependent on the word order of the language and are, hence, language-specific.…”
Section: Addressing Language Divergencementioning
confidence: 99%
“…This shows that dedicating a few parameters to learn language tokens can help a decoder maintain a balance between language-agnostic and language-distinct features. Hokamp et al [64] showed that more often than not, using separate decoders and attention mechanisms gives better results as compared to a shared decoder and attention mechanism. This work implies that the best way to handle language divergence would be to use a shared encoder for source languages and different decoders for target languages.…”
Section: Addressing Language Divergencementioning
confidence: 99%
See 1 more Smart Citation