Investigating Catastrophic Forgetting During Continual Training for Neural Machine Translation

Gu, Shuhao; Feng, Yang

doi:10.18653/v1/2020.coling-main.381

Cited by 12 publications

(10 citation statements)

References 19 publications

(17 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Given the above, we aim to propose a method of domain adaptation that can not only deal with large domain divergence during domain transferring but also keep a stable model size during domain increase. Inspired by some analysis work on NMT (Bau et al, 2019;Voita et al, 2019;Gu and Feng, 2020), we find that only some important parameters in a well-trained NMT model are responsible for generating the translation and unimportant parameters can be erased without affecting the translation quality too much. According to these findings, we can preserve important parameters for the general-domain, while turning unimportant parameters for the in-domain.…”

Section: Introductionmentioning

confidence: 76%

Pruning-then-Expanding Model for Domain Adaptation of Neural Machine Translation

Gu¹,

Feng²,

Xie³

2021

Preprint

Self Cite

View full text Add to dashboard Cite

Domain Adaptation is widely used in practical applications of neural machine translation, which aims to achieve good performance on both the general-domain and in-domain. However, the existing methods for domain adaptation usually suffer from catastrophic forgetting, domain divergence, and model explosion.To address these three problems, we propose a method of "divide and conquer" which is based on the importance of neurons or parameters in the translation model. In our method, we first prune the model and only keep the important neurons or parameters, making them responsible for both general-domain and indomain translation. Then we further train the pruned model supervised by the original unpruned model with the knowledge distillation method. Last we expand the model to the original size and fine-tune the added parameters for the in-domain translation. We conduct experiments on different languages and domains and the results show that our method can achieve significant improvements compared with several strong baselines.

show abstract

Section: Introductionmentioning

confidence: 76%

Pruning-then-Expanding Model for Domain Adaptation of Neural Machine Translation

Gu¹,

Feng²,

Xie³

2021

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…Given the above, we propose a method of domain adaptation that can not only deal with large domain divergence during domain transferring but also keep a stable model size even with multiple domains. Inspired by the analysis work on NMT (Bau et al, 2019;Voita et al, 2019;Gu and Feng, 2020), we find that only some important parameters in a well-trained NMT model play an important role when generating the translation and unimportant parameters can be erased without affecting the translation quality too much. According to these findings, we can preserve important parameters for generaldomain translation, while tuning unimportant parameters for in-domain translation.…”

Section: Introductionmentioning

confidence: 85%

Pruning-then-Expanding Model for Domain Adaptation of Neural Machine Translation

Gu¹,

Feng²,

Xie³

2021

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua

Self Cite

View full text Add to dashboard Cite

Domain Adaptation is widely used in practical applications of neural machine translation, which aims to achieve good performance on both general domain and in-domain data. However, the existing methods for domain adaptation usually suffer from catastrophic forgetting, large domain divergence, and model explosion. To address these three problems, we propose a method of "divide and conquer" which is based on the importance of neurons or parameters for the translation model. In this method, we first prune the model and only keep the important neurons or parameters, making them responsible for both generaldomain and in-domain translation. Then we further train the pruned model supervised by the original whole model with knowledge distillation. Last we expand the model to the original size and fine-tune the added parameters for the in-domain translation. We conducted experiments on different language pairs and domains and the results show that our method can achieve significant improvements compared with several strong baselines.

show abstract

“…Catastrophic forgetting refers to the loss of previously acquired knowledge in the model during transfer to a new task. To the best of our knowledge, catastrophic forgetting in MT models has only been studied within the context of inter-domain adaptation (Thompson et al, 2019;Gu and Feng, 2020), and not inter-lingual adaptation.…”

Section: Mitigating Forgettingmentioning

confidence: 99%

Cross-Attention is All You Need: Adapting Pretrained Transformers for Machine Translation

Gheini¹,

Ren²,

May³

2021

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

We study the power of cross-attention in the Transformer architecture within the context of transfer learning for machine translation, and extend the findings of studies into crossattention when training from scratch. We conduct a series of experiments through finetuning a translation model on data where either the source or target language has changed. These experiments reveal that fine-tuning only the cross-attention parameters is nearly as effective as fine-tuning all parameters (i.e., the entire translation model). We provide insights into why this is the case and observe that limiting fine-tuning in this manner yields crosslingually aligned embeddings. The implications of this finding for researchers and practitioners include a mitigation of catastrophic forgetting, the potential for zero-shot translation, and the ability to extend machine translation models to several new language pairs with reduced parameter storage overhead. 1

show abstract

Investigating Catastrophic Forgetting During Continual Training for Neural Machine Translation

Cited by 12 publications

References 19 publications

Pruning-then-Expanding Model for Domain Adaptation of Neural Machine Translation

Pruning-then-Expanding Model for Domain Adaptation of Neural Machine Translation

Pruning-then-Expanding Model for Domain Adaptation of Neural Machine Translation

Cross-Attention is All You Need: Adapting Pretrained Transformers for Machine Translation

Contact Info

Product

Resources

About