Pruning-then-Expanding Model for Domain Adaptation of Neural Machine Translation

Gu, Shuhao; Feng, Yang; Xie, Wanying

doi:10.18653/v1/2021.naacl-main.308

Cited by 17 publications

(16 citation statements)

References 34 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Identifying the salient neurons with respect to a domain can be effectively used for domain adaptation and generalization. Gu et al (2021) proposed a domain adaptation method using neuron pruning to target the problem of catastrophic forgetting of the general domain when fine-tuning a model for a target domain. They introduced a three step adaptation process: i) rank the most important neurons based on their importance, ii) prune the unimportant neurons from the network and retrain with student-teacher framework, iii) expand the network to its original size and fine-tune towards in-domain freezing the salient neurons and adjusting only the unimportant neurons.…”

Section: Domain Adaptationmentioning

confidence: 99%

Neuron-level Interpretation of Deep NLP Models: A Survey

Hassan¹,

Durrani²,

Dalvi³

2021

Preprint

View full text Add to dashboard Cite

The proliferation of deep neural networks in various domains has seen an increased need for interpretability of these methods. A plethora of research has been carried out to analyze and understand components of the deep neural network models. Preliminary work done along these lines and papers that surveyed such, were focused on a more high-level representation analysis. However, a recent branch of work has concentrated on interpretability at a more granular level, analyzing neurons and groups of neurons in these large models. In this paper, we survey work done on fine-grained neuron analysis including: i) methods developed to discover and understand neurons in a network, ii) their limitations and evaluation, iii) major findings including cross architectural comparison that such analyses unravel and iv) direct applications of neuron analysis such as model behavior control and domain adaptation along with potential directions for future work.

show abstract

Section: Domain Adaptationmentioning

confidence: 99%

Neuron-level Interpretation of Deep NLP Models: A Survey

Hassan¹,

Durrani²,

Dalvi³

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Parameters to be frozen do not necessarily have to come from the same subnetwork. More recent work finds sparse or underused areas of the network that can be easily adapted to new domains (Gu et al, 2021;Liang et al, 2020). A related idea is to factorize existing model components into general domain and domain-specific before tuning (Deng et al, 2020).…”

Section: Freezing Parametersmentioning

confidence: 99%

Domain Adaptation and Multi-Domain Adaptation for Neural Machine Translation: A Survey

Saunders¹

2021

Preprint

View full text Add to dashboard Cite

The development of deep learning techniques has allowed Neural Machine Translation (NMT) models to become extremely powerful, given sufficient training data and training time. However, systems struggle when translating text from a new domain with a distinct style or vocabulary. Tuning on a representative training corpus allows good indomain translation, but such data-centric approaches can cause over-fitting to new data and 'catastrophic forgetting' of previously learned behaviour.We concentrate on more robust approaches to domain adaptation for NMT, particularly the case where a system may need to translate sentences from multiple domains. We divide techniques into those relating to data selection, model architecture, parameter adaptation procedure, and inference procedure. We finally highlight the benefits of domain adaptation and multi-domain adaptation techniques to other lines of NMT research.

show abstract

“…Besides, Lan et al (2020) presents two parameter reduction techniques to lower memory consumption and increase the training speed of BERT. Gu et al (2021) prune then expand the model neurons or parame-ters based on importance on domain adaptation of neural machine translation.…”

Section: Related Workmentioning

confidence: 99%

Importance-based Neuron Allocation for Multilingual Neural Machine Translation

Xie¹,

Feng²,

Gu³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

Multilingual neural machine translation with a single model has drawn much attention due to its capability to deal with multiple languages. However, the current multilingual translation paradigm often makes the model tend to preserve the general knowledge, but ignore the language-specific knowledge. Some previous works try to solve this problem by adding various kinds of language-specific modules to the model, but they suffer from the parameter explosion problem and require specialized manual design. To solve these problems, we propose to divide the model neurons into general and language-specific parts based on their importance across languages. The general part is responsible for preserving the general knowledge and participating in the translation of all the languages, while the language-specific part is responsible for preserving the languagespecific knowledge and participating in the translation of some specific languages. Experimental results on several language pairs, covering IWSLT and Europarl corpus datasets, demonstrate the effectiveness and universality of the proposed method.

show abstract

Pruning-then-Expanding Model for Domain Adaptation of Neural Machine Translation

Cited by 17 publications

References 34 publications

Neuron-level Interpretation of Deep NLP Models: A Survey

Neuron-level Interpretation of Deep NLP Models: A Survey

Domain Adaptation and Multi-Domain Adaptation for Neural Machine Translation: A Survey

Importance-based Neuron Allocation for Multilingual Neural Machine Translation

Contact Info

Product

Resources

About