Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen 2019
DOI: 10.18653/v1/d19-1146
|View full text |Cite
|
Sign up to set email alerts
|

Exploiting Multilingualism through Multistage Fine-Tuning for Low-Resource Neural Machine Translation

Abstract: This paper highlights the impressive utility of multi-parallel corpora for transfer learning in a one-to-many low-resource neural machine translation (NMT) setting. We report on a systematic comparison of multistage finetuning configurations, consisting of (1) pretraining on an external large (209k-440k) parallel corpus for English and a helping target language, (2) mixed pre-training or fine-tuning on a mixture of the external and low-resource (18k) target parallel corpora, and (3) pure finetuning on the targ… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
32
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
4
1
1

Relationship

2
4

Authors

Journals

citations
Cited by 39 publications
(34 citation statements)
references
References 14 publications
1
32
0
Order By: Relevance
“…Although multilingualism is known to improve performance for low-resource languages, we observe drops in performance for some of the language pairs involved. The work on multi-stage fine-tuning (Dabre et al 2019), which uses N-way parallel corpora similar to the ones used in our work, supports our observations regarding drops in performance. However, based on the characteristics of our parallel corpora, the multilingual models trained and evaluated in our work do not benefit from additional knowledge by increasing the number of translation directions, because the translation content for all language pairs is the same.…”
Section: Multilingualismsupporting
confidence: 87%
See 1 more Smart Citation
“…Although multilingualism is known to improve performance for low-resource languages, we observe drops in performance for some of the language pairs involved. The work on multi-stage fine-tuning (Dabre et al 2019), which uses N-way parallel corpora similar to the ones used in our work, supports our observations regarding drops in performance. However, based on the characteristics of our parallel corpora, the multilingual models trained and evaluated in our work do not benefit from additional knowledge by increasing the number of translation directions, because the translation content for all language pairs is the same.…”
Section: Multilingualismsupporting
confidence: 87%
“…One of the advantages of multilingual NMT is its ability to leverage high-resource language pairs to improve the translation quality on low-resource ones. Previous studies have shown that jointly learning low-resource and high-resource pairs leads to improved translation quality for the low-resource one Johnson et al 2017;Dabre et al 2019). Furthermore, the performance tends to improve as the number of language pairs (consequently the training data) increases (Aharoni et al 2019).…”
Section: Multilingual Nmtmentioning
confidence: 99%
“…Johnson et al [70] showed that joint training does not provide any significant benefit. Fine-tuning is beneficial in very low-resource scenarios [37], but gains may be limited due to catastrophic forgetting. Dabre et al [37] showed that a multi-stage fine-tuning process is beneficial when multiple target languages are involved.…”
Section: Trainingmentioning
confidence: 99%
“…Fine-tuning is beneficial in very low-resource scenarios [37], but gains may be limited due to catastrophic forgetting. Dabre et al [37] showed that a multi-stage fine-tuning process is beneficial when multiple target languages are involved. They do not focus on language divergence during their multilingual multi-stage tuning but show that the size of helping data matters.…”
Section: Trainingmentioning
confidence: 99%
See 1 more Smart Citation