Exploiting Multilingualism through Multistage Fine-Tuning for Low-Resource Neural Machine Translation

Dabre, Raj; Fujita, Atsushi; Chu, Chenhui

doi:10.18653/v1/d19-1146

Cited by 39 publications

(34 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Although multilingualism is known to improve performance for low-resource languages, we observe drops in performance for some of the language pairs involved. The work on multi-stage fine-tuning (Dabre et al 2019), which uses N-way parallel corpora similar to the ones used in our work, supports our observations regarding drops in performance. However, based on the characteristics of our parallel corpora, the multilingual models trained and evaluated in our work do not benefit from additional knowledge by increasing the number of translation directions, because the translation content for all language pairs is the same.…”

Section: Multilingualismsupporting

confidence: 87%

“…One of the advantages of multilingual NMT is its ability to leverage high-resource language pairs to improve the translation quality on low-resource ones. Previous studies have shown that jointly learning low-resource and high-resource pairs leads to improved translation quality for the low-resource one Johnson et al 2017;Dabre et al 2019). Furthermore, the performance tends to improve as the number of language pairs (consequently the training data) increases (Aharoni et al 2019).…”

Section: Multilingual Nmtmentioning

confidence: 99%

See 1 more Smart Citation

Extremely low-resource neural machine translation for Asian languages

Rubino¹,

Marie²,

Dabre³

et al. 2020

Machine Translation

Self Cite

View full text Add to dashboard Cite

This paper presents a set of effective approaches to handle extremely low-resource language pairs for self-attention based neural machine translation (NMT) focusing on English and four Asian languages. Starting from an initial set of parallel sentences used to train bilingual baseline models, we introduce additional monolingual corpora and data processing techniques to improve translation quality. We describe a series of best practices and empirically validate the methods through an evaluation conducted on eight translation directions, based on state-of-the-art NMT approaches such as hyper-parameter search, data augmentation with forward and backward translation in combination with tags and noise, as well as joint multilingual training. Experiments show that the commonly used default architecture of self-attention NMT models does not reach the best results, validating previous work on the importance of hyper-parameter tuning. Additionally, empirical results indicate the amount of synthetic data required to efficiently increase the parameters of the models leading to the best translation quality measured by automatic metrics. We show that the best NMT models trained on large amount of tagged back-translations outperform three other synthetic data generation approaches. Finally, comparison with statistical machine translation (SMT) indicates that extremely low-resource NMT requires a large amount of synthetic parallel data obtained with back-translation in order to close the performance gap with the preceding SMT approach.

show abstract

Section: Multilingualismsupporting

confidence: 87%

Section: Multilingual Nmtmentioning

confidence: 99%

Extremely low-resource neural machine translation for Asian languages

Rubino¹,

Marie²,

Dabre³

et al. 2020

Machine Translation

Self Cite

View full text Add to dashboard Cite

show abstract

“…Johnson et al [70] showed that joint training does not provide any significant benefit. Fine-tuning is beneficial in very low-resource scenarios [37], but gains may be limited due to catastrophic forgetting. Dabre et al [37] showed that a multi-stage fine-tuning process is beneficial when multiple target languages are involved.…”

Section: Trainingmentioning

confidence: 99%

“…Fine-tuning is beneficial in very low-resource scenarios [37], but gains may be limited due to catastrophic forgetting. Dabre et al [37] showed that a multi-stage fine-tuning process is beneficial when multiple target languages are involved. They do not focus on language divergence during their multilingual multi-stage tuning but show that the size of helping data matters.…”

Section: Trainingmentioning

confidence: 99%

“…Chu and Dabre [28] focused on training a single translation model for multiple domains by either learning domain-specialized hidden-state representations or predictor biases for each domain and incorporating multilingualism into the domain adaptation framework. Dabre et al [37] applied multi-stage fine-tuning on multiway MNMT, which has been shown effective in domain adaptation by Reference [29]. Bapna and Firat [11] improved the scalability of fine-tuning for both MNMT and domain adaptation.…”

Section: Connections With Domain Adaptationmentioning

confidence: 99%

See 1 more Smart Citation

A Survey of Multilingual Neural Machine Translation

2020

Self Cite

View full text Add to dashboard Cite

We present a survey on multilingual neural machine translation (MNMT), which has gained a lot of traction in recent years. MNMT has been useful in improving translation quality as a result of translation knowledge transfer (transfer learning). MNMT is more promising and interesting than its statistical machine translation counterpart, because end-to-end modeling and distributed representations open new avenues for research on machine translation. Many approaches have been proposed to exploit multilingual parallel corpora for improving translation quality. However, the lack of a comprehensive survey makes it difficult to determine which approaches are promising and, hence, deserve further exploration. In this article, we present an indepth survey of existing literature on MNMT. We first categorize various approaches based on their central use-case and then further categorize them based on resource scenarios, underlying modeling principles, coreissues, and challenges. Wherever possible, we address the strengths and weaknesses of several techniques by comparing them with each other. We also discuss the future directions for MNMT. This article is aimed towards both beginners and experts in NMT. We hope this article will serve as a starting point as well as a source of new ideas for researchers and engineers interested in MNMT.

show abstract