An Empirical Comparison of Domain Adaptation Methods for Neural
            Machine Translation

Chu, Chenhui; Dabre, Raj; Kurohashi, Sadao

doi:10.18653/v1/p17-2061

Cited by 176 publications

(188 citation statements)

References 14 publications

Supporting

Mentioning

174

Contrasting

Unclassified

Order By: Relevance

“…These techniques are largely orthogonal to ours 1 and can be used in combination. In fact, Chu et al (2017) successfully apply fine-tuning in combination with joint training.…”

Section: Introductionmentioning

confidence: 99%

Regularization techniques for fine-tuning in neural machine translation

Barone¹,

Haddow²,

Germann³

et al. 2017

Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

We investigate techniques for supervised domain adaptation for neural machine translation where an existing model trained on a large out-of-domain dataset is adapted to a small in-domain dataset.In this scenario, overfitting is a major challenge. We investigate a number of techniques to reduce overfitting and improve transfer learning, including regularization techniques such as dropout and L2-regularization towards an out-of-domain prior. In addition, we introduce tuneout, a novel regularization technique inspired by dropout. We apply these techniques, alone and in combination, to neural machine translation, obtaining improvements on IWSLT datasets for English→German and English→Russian. We also investigate the amounts of in-domain training data needed for domain adaptation in NMT, and find a logarithmic relationship between the amount of training data and gain in BLEU score.

show abstract

“…These techniques are largely orthogonal to ours 1 and can be used in combination. In fact, Chu et al (2017) successfully apply fine-tuning in combination with joint training.…”

Section: Introductionmentioning

confidence: 99%

Regularization techniques for fine-tuning in neural machine translation

Barone¹,

Haddow²,

Germann³

et al. 2017

Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

show abstract

“…In this aspect, fine-tuning (Luong and Manning, 2015;Zoph et al, 2016;Servan et al, 2016) is the most popular approach, where the NMT model is first trained using the out-of-domain training corpus, and then fine-tuned on the in-domain training corpus. To avoid overfitting, Chu et al (2017) blended in-domain with out-of-domain corpora to fine-tune the pre-trained model, and Freitag and Al-Onaizan (2016) combined the fine-tuned model with the baseline via ensemble method. Meanwhile, applying data weighting into NMT domain adaptation has attracted much attention.…”

Section: Related Workmentioning

confidence: 99%

“…The other is to use the mixed-domain training corpus to construct a unified NMT model for all domains. Here, we mainly focus on the first type of research, of which typical methods include finetuning (Luong and Manning, 2015;Zoph et al, 2016;Servan et al, 2016), mixed fine-tuning (Chu et al, 2017), cost weighting , data selection (Wang et al, 2017a,b;Zhang et al, 2019a) and so on. The underlying assumption of these approaches is that in-domain and out-ofdomain NMT models share the same parameter space or prior distributions, and the useful out-ofdomain translation knowledge can be completely transferred to in-domain NMT model in a onepass manner.…”

Section: Introductionmentioning

confidence: 99%

Iterative Dual Domain Adaptation for Neural Machine Translation

Zeng¹,

Liu²,

Su³

et al. 2019

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen

View full text Add to dashboard Cite

Previous studies on the domain adaptation for neural machine translation (NMT) mainly focus on the one-pass transferring out-ofdomain translation knowledge to in-domain NMT model. In this paper, we argue that such a strategy fails to fully extract the domainshared translation knowledge, and repeatedly utilizing corpora of different domains can lead to better distillation of domain-shared translation knowledge. To this end, we propose an iterative dual domain adaptation framework for NMT. Specifically, we first pretrain in-domain and out-of-domain NMT models using their own training corpora respectively, and then iteratively perform bidirectional translation knowledge transfer (from indomain to out-of-domain and then vice versa) based on knowledge distillation until the indomain NMT model convergences. Furthermore, we extend the proposed framework to the scenario of multiple out-of-domain training corpora, where the above-mentioned transfer is performed sequentially between the indomain and each out-of-domain NMT models in the ascending order of their domain similarities. Empirical results on Chinese-English and English-German translation tasks demonstrate the effectiveness of our framework.

show abstract

“…The task requires both style transfer and domain-specific generation on the target domain. To differentiate different domains, Sennrich et al (2016a); Chu et al (2017) appended domain tokens to the input sentences. Our model uses learnable domain vectors combining domain-specific style classifiers, which force the model to learn distinct stylized information in each domain.…”

Section: Related Workmentioning

confidence: 99%

Domain Adaptive Text Style Transfer

Zhang

Gan

et al. 2019

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen

View full text Add to dashboard Cite

Text style transfer without parallel data has achieved some practical success. However, in the scenario where less data is available, these methods may yield poor performance. In this paper, we examine domain adaptation for text style transfer to leverage massively available data from other domains. These data may demonstrate domain shift, which impedes the benefits of utilizing such data for training. To address this challenge, we propose simple yet effective domain adaptive text style transfer models, enabling domain-adaptive information exchange. The proposed models presumably learn from the source domain to: (i) distinguish stylized information and generic content information; (ii) maximally preserve content information; and (iii) adaptively transfer the styles in a domain-aware manner. We evaluate the proposed models on two style transfer tasks (sentiment and formality) over multiple target domains where only limited non-parallel data is available. Extensive experiments demonstrate the effectiveness of the proposed model compared to the baselines.

show abstract

An Empirical Comparison of Domain Adaptation Methods for Neural Machine Translation

Cited by 176 publications

References 14 publications

Regularization techniques for fine-tuning in neural machine translation

Regularization techniques for fine-tuning in neural machine translation

Iterative Dual Domain Adaptation for Neural Machine Translation

Domain Adaptive Text Style Transfer

Contact Info

Product

Resources

About