Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing 2018
DOI: 10.18653/v1/d18-1104
|View full text |Cite
|
Sign up to set email alerts
|

Compact Personalized Models for Neural Machine Translation

Abstract: We propose and compare methods for gradientbased domain adaptation of self-attentive neural machine translation models. We demonstrate that a large proportion of model parameters can be frozen during adaptation with minimal or no reduction in translation quality by encouraging structured sparsity in the set of offset tensors during learning via group lasso regularization. We evaluate this technique for both batch and incremental adaptation across multiple data sets and language pairs. Our system architecture-c… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
30
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
5
2
1

Relationship

1
7

Authors

Journals

citations
Cited by 54 publications
(31 citation statements)
references
References 18 publications
0
30
0
Order By: Relevance
“…Thompson et al (2018) fine-tune selected components of the base model architecture, in order to determine how much fine-tuning each component contributes to the final adaptation performance. Wuebker et al (2018) propose introducing sparse offsets from the base model parameters for every domain, reducing the memory complexity of loading and unloading domain specific parameters in real world settings. train the base model to utilize neighboring samples from the training set, enabling the model to adapt to new domains without the need for additional parameter updates.…”
Section: Related Workmentioning
confidence: 99%
“…Thompson et al (2018) fine-tune selected components of the base model architecture, in order to determine how much fine-tuning each component contributes to the final adaptation performance. Wuebker et al (2018) propose introducing sparse offsets from the base model parameters for every domain, reducing the memory complexity of loading and unloading domain specific parameters in real world settings. train the base model to utilize neighboring samples from the training set, enabling the model to adapt to new domains without the need for additional parameter updates.…”
Section: Related Workmentioning
confidence: 99%
“…Regularization for segment-wise continued training in NMT has been explored by by means of knowledge distillation, and with the group lasso by Wuebker et al (2018), as used in this paper.…”
Section: Related Workmentioning
confidence: 99%
“…Another alternative is freezing parts of the model , for example determining a subset of parameters by performance on a held-out set (Wuebker et al, 2018). In our experiments we use two systems using this method, fixed and top, the former being a pre-determined fixed selection of parameters, and the latter being the topmost encoder and decoder layers in the Transformer NMT model (Vaswani et al, 2017).…”
Section: Online Adaptationmentioning
confidence: 99%
See 1 more Smart Citation
“…Kothur et al (2018) included a dictionary of translations, to deal with the novel words included in the new domain. Wuebker et al (2018) proposed to apply sparse updates, to adapt the NMT system to different users.…”
Section: Online Learning In Nmtmentioning
confidence: 99%