Improving Domain Adaptation Translation with Domain Invariant and Specific Information

Gu, Shuhao; Feng, Yang; Li, Qun

doi:10.18653/v1/n19-1312

Cited by 31 publications

(24 citation statements)

References 30 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Some work also makes use of word frequency information to help learning, such as in the word segmentation (Sun et al, 2014) and term extraction (Frantzi et al, 1998;Vu et al, 2008). In NMT, word frequency information is used for curriculum learning (Kocmi and Bojar, 2017;Platanios et al, 2019) and domain adaptation data selection (Wang et al, 2017;Zhang and Xiong, 2018;Gu et al, 2019). Wang et al (2020) analyzed the miscalibration problem on the low-frequency tokens.…”

Section: Related Workmentioning

confidence: 99%

Token-level Adaptive Training for Neural Machine Translation

Gu¹,

Zhang²,

Meng³

et al. 2020

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Self Cite

View full text Add to dashboard Cite

There exists a token imbalance phenomenon in natural language as different tokens appear with different frequencies, which leads to different learning difficulties for tokens in Neural Machine Translation (NMT). The vanilla NMT model usually adopts trivial equal-weighted objectives for target tokens with different frequencies and tends to generate more high-frequency tokens and less lowfrequency tokens compared with the golden token distribution. However, low-frequency tokens may carry critical semantic information that will affect the translation quality once they are neglected. In this paper, we explored target token-level adaptive objectives based on token frequencies to assign appropriate weights for each target token during training. We aimed that those meaningful but relatively low-frequency words could be assigned with larger weights in objectives to encourage the model to pay more attention to these tokens. Our method yields consistent improvements in translation quality on ZH-EN, EN-RO, and EN-DE translation tasks, especially on sentences that contain more low-frequency tokens where we can get 1.68, 1.02, and 0.52 BLEU increases compared with baseline, respectively. Further analyses show that our method can also improve the lexical diversity of translation.

show abstract

Section: Related Workmentioning

confidence: 99%

Token-level Adaptive Training for Neural Machine Translation

Gu¹,

Zhang²,

Meng³

et al. 2020

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Self Cite

View full text Add to dashboard Cite

show abstract

“…Fine-tuning (or continued learning) is a standard domain adaptation method in NMT. Given an NMT model pre-trained with a massive amount of sourcedomain parallel data, it continues the training of this pre-trained model with a small amount of target-domain parallel data (Luong and Manning, 2015;Chu et al, 2017;Bapna and Firat, 2019;Gu et al, 2019 Unsupervised domain adaptation exploits targetdomain monolingual data to train a language model to support the model's decoder in generating natural sentences in a target domain (Gülçehre et al, 2015;Domhan and Hieber, 2017). Data augmentation using back-translation (Sennrich et al, 2016a;Hu et al, 2019) is another approach to using targetdomain monolingual data.…”

Section: Related Workmentioning

confidence: 99%

Vocabulary Adaptation for Domain Adaptation in Neural Machine Translation

Sato¹,

Sakuma²,

Yoshinaga³

et al. 2020

Findings of the Association for Computational Linguistics: EMNLP 2020

View full text Add to dashboard Cite

Neural network methods exhibit strong performance only in a few resource-rich domains. Practitioners therefore employ domain adaptation from resource-rich domains that are, in most cases, distant from the target domain. Domain adaptation between distant domains (e.g., movie subtitles and research papers), however, cannot be performed effectively due to mismatches in vocabulary; it will encounter many domain-specific words (e.g., "angstrom") and words whose meanings shift across domains (e.g., "conductor"). In this study, aiming to solve these vocabulary mismatches in domain adaptation for neural machine translation (NMT), we propose vocabulary adaptation, a simple method for effective fine-tuning that adapts embedding layers in a given pretrained NMT model to the target domain. Prior to fine-tuning, our method replaces the embedding layers of the NMT model by projecting general word embeddings induced from monolingual data in a target domain onto a source-domain embedding space. Experimental results indicate that our method improves the performance of conventional fine-tuning by 3.86 and 3.28 BLEU points in En→Ja and De→En translation, respectively.

show abstract

“…Dakwale and Monz (2017) minimizes the cross-entropy between the output distribution of the general-domain model and the fine-tuned model. Gu et al (2019) adds a discriminator to help preserve the domain-shared features and fine tunes the whole model on the mixed training data. Jiang et al (2020) proposes to obtain the word representations by mixing their embedding in individual domains based on the domain proportions.…”

Section: Continual Trainingmentioning

confidence: 99%

Investigating Catastrophic Forgetting During Continual Training for Neural Machine Translation

Feng

2020

Proceedings of the 28th International Conference on Computational Linguistics

Self Cite

View full text Add to dashboard Cite

Neural machine translation (NMT) models usually suffer from catastrophic forgetting during continual training where the models tend to gradually forget previously learned knowledge and swing to fit the newly added data which may have a different distribution, e.g. a different domain. Although many methods have been proposed to solve this problem, we cannot get to know what causes this phenomenon yet. Under the background of domain adaptation, we investigate the cause of catastrophic forgetting from the perspectives of modules and parameters (neurons). The investigation on the modules of the NMT model shows that some modules have tight relation with the general-domain knowledge while some other modules are more essential in the domain adaptation. And the investigation on the parameters shows that some parameters are important for both the general-domain and in-domain translation and the great change of them during continual training brings about the performance decline in general-domain. We conduct experiments across different language pairs and domains to ensure the validity and reliability of our findings.

show abstract

Improving Domain Adaptation Translation with Domain Invariant and Specific Information

Cited by 31 publications

References 30 publications

Token-level Adaptive Training for Neural Machine Translation

Token-level Adaptive Training for Neural Machine Translation

Vocabulary Adaptation for Domain Adaptation in Neural Machine Translation

Investigating Catastrophic Forgetting During Continual Training for Neural Machine Translation

Contact Info

Product

Resources

About