Proceedings of the 28th International Conference on Computational Linguistics 2020
DOI: 10.18653/v1/2020.coling-main.381
|View full text |Cite
|
Sign up to set email alerts
|

Investigating Catastrophic Forgetting During Continual Training for Neural Machine Translation

Abstract: Neural machine translation (NMT) models usually suffer from catastrophic forgetting during continual training where the models tend to gradually forget previously learned knowledge and swing to fit the newly added data which may have a different distribution, e.g. a different domain. Although many methods have been proposed to solve this problem, we cannot get to know what causes this phenomenon yet. Under the background of domain adaptation, we investigate the cause of catastrophic forgetting from the perspec… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
7
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
6
1
1

Relationship

2
6

Authors

Journals

citations
Cited by 12 publications
(10 citation statements)
references
References 19 publications
(17 reference statements)
0
7
0
Order By: Relevance
“…Given the above, we aim to propose a method of domain adaptation that can not only deal with large domain divergence during domain transferring but also keep a stable model size during domain increase. Inspired by some analysis work on NMT (Bau et al, 2019;Voita et al, 2019;Gu and Feng, 2020), we find that only some important parameters in a well-trained NMT model are responsible for generating the translation and unimportant parameters can be erased without affecting the translation quality too much. According to these findings, we can preserve important parameters for the general-domain, while turning unimportant parameters for the in-domain.…”
Section: Introductionmentioning
confidence: 76%
“…Given the above, we aim to propose a method of domain adaptation that can not only deal with large domain divergence during domain transferring but also keep a stable model size during domain increase. Inspired by some analysis work on NMT (Bau et al, 2019;Voita et al, 2019;Gu and Feng, 2020), we find that only some important parameters in a well-trained NMT model are responsible for generating the translation and unimportant parameters can be erased without affecting the translation quality too much. According to these findings, we can preserve important parameters for the general-domain, while turning unimportant parameters for the in-domain.…”
Section: Introductionmentioning
confidence: 76%
“…Given the above, we propose a method of domain adaptation that can not only deal with large domain divergence during domain transferring but also keep a stable model size even with multiple domains. Inspired by the analysis work on NMT (Bau et al, 2019;Voita et al, 2019;Gu and Feng, 2020), we find that only some important parameters in a well-trained NMT model play an important role when generating the translation and unimportant parameters can be erased without affecting the translation quality too much. According to these findings, we can preserve important parameters for generaldomain translation, while tuning unimportant parameters for in-domain translation.…”
Section: Introductionmentioning
confidence: 85%
“…Catastrophic forgetting refers to the loss of previously acquired knowledge in the model during transfer to a new task. To the best of our knowledge, catastrophic forgetting in MT models has only been studied within the context of inter-domain adaptation (Thompson et al, 2019;Gu and Feng, 2020), and not inter-lingual adaptation.…”
Section: Mitigating Forgettingmentioning
confidence: 99%