Compact Personalized Models for Neural Machine Translation

Wuebker, Joern; Simianer, Patrick; DeNero, John

doi:10.18653/v1/d18-1104

Cited by 54 publications

(31 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Thompson et al (2018) fine-tune selected components of the base model architecture, in order to determine how much fine-tuning each component contributes to the final adaptation performance. Wuebker et al (2018) propose introducing sparse offsets from the base model parameters for every domain, reducing the memory complexity of loading and unloading domain specific parameters in real world settings. train the base model to utilize neighboring samples from the training set, enabling the model to adapt to new domains without the need for additional parameter updates.…”

Section: Related Workmentioning

confidence: 99%

Simple, Scalable Adaptation for Neural Machine Translation

Bapna¹,

Arivazhagan²,

Fırat³

2019

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen

205

210

View full text Add to dashboard Cite

Fine-tuning pre-trained Neural Machine Translation (NMT) models is the dominant approach for adapting to new languages and domains. However, fine-tuning requires adapting and maintaining a separate model for each target task. We propose a simple yet efficient approach for adaptation in NMT. Our proposed approach consists of injecting tiny task specific adapter layers into a pre-trained model. These lightweight adapters, with just a small fraction of the original model size, adapt the model to multiple individual tasks simultaneously.We evaluate our approach on two tasks: (i) Domain Adaptation and (ii) Massively Multilingual NMT. Experiments on domain adaptation demonstrate that our proposed approach is on par with full fine-tuning on various domains, dataset sizes and model capacities. On a massively multilingual dataset of 103 languages, our adaptation approach bridges the gap between individual bilingual models and one massively multilingual model for most language pairs, paving the way towards universal machine translation.

show abstract

Section: Related Workmentioning

confidence: 99%

Simple, Scalable Adaptation for Neural Machine Translation

Bapna¹,

Arivazhagan²,

Fırat³

2019

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen

205

210

View full text Add to dashboard Cite

show abstract

“…Regularization for segment-wise continued training in NMT has been explored by by means of knowledge distillation, and with the group lasso by Wuebker et al (2018), as used in this paper.…”

Section: Related Workmentioning

confidence: 99%

“…Another alternative is freezing parts of the model , for example determining a subset of parameters by performance on a held-out set (Wuebker et al, 2018). In our experiments we use two systems using this method, fixed and top, the former being a pre-determined fixed selection of parameters, and the latter being the topmost encoder and decoder layers in the Transformer NMT model (Vaswani et al, 2017).…”

Section: Online Adaptationmentioning

confidence: 99%

“…Fine tuning applied to all parameters in the NMT model maximizes one-shot acquisition, but shows a worrisome degradation in zero-shot recall. By contrast, fine tuning with group lasso regularization, a technique recently proposed to improve the space efficiency of adapted models (Wuebker et al, 2018), achieves an appealing balance of zero-shot and one-shot vocabulary acquisition as well as high corpus-level translation quality.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Measuring Immediate Adaptation Performance for Neural Machine Translation

Simianer

Wuebker

DeNero

2019

Proceedings of the 2019 Conference of the North

Self Cite

View full text Add to dashboard Cite

Incremental domain adaptation, in which a system learns from the correct output for each input immediately after making its prediction for that input, can dramatically improve system performance for interactive machine translation. Users of interactive systems are sensitive to the speed of adaptation and how often a system repeats mistakes, despite being corrected. Adaptation is most commonly assessed using corpus-level BLEU-or TERderived metrics that do not explicitly take adaptation speed into account. We find that these metrics often do not capture immediate adaptation effects, such as zero-shot and oneshot learning of domain-specific lexical items. To this end, we propose new metrics that directly evaluate immediate adaptation performance for machine translation. We use these metrics to choose the most suitable adaptation method from a range of different adaptation techniques for neural machine translation systems.

show abstract

“…Kothur et al (2018) included a dictionary of translations, to deal with the novel words included in the new domain. Wuebker et al (2018) proposed to apply sparse updates, to adapt the NMT system to different users.…”

Section: Online Learning In Nmtmentioning

confidence: 99%

Interactivity, Adaptation and Multimodality in Neural Sequence-to-sequence Learning

Abril¹

View full text Add to dashboard Cite

Chapter 1 frames the scope of this thesis, introducing the pattern recognition field and, more specifically, the MT field. It reviews the different historical approaches devised to tackle this problem. Moreover, it sets the experimental framework followed in this thesis and the main scientific objectives. Chapter 2 describes the mathematical model that represents the core of the thesis: neural networks. It addresses the parameter estimation process, describes different neural architectures and a number of techniques used along the thesis to improve the generalization capability of the model. Chapter 3 introduces the neural machine translation technology, describing the most common architectures and decoding process. Moreover, it reviews different aspects relating the NMT field that nowadays receive the attention of the research community. It also compares NMT in the different translation tasks that will be tackled in the thesis. Chapter 4 introduces the interactive-predictive pattern recognition field, that aims to minimize the effort spent by the user while supervising an automatic system. It proposes the application of this theoretical framework to the neural technology, introducing alternative interaction protocols. After that, these interactivepredictive neural systems are evaluated. Chapter 5 describes the adaptation of NMT systems via online learning techniques. After receiving a corrected sample, the system can be updated to include this new knowledge. Here are described the methods to perform this adaptation and introduces two novel alternatives. In addition, an active learning framework for neural systems is proposed, useful for a situation that requires the translation of large amounts of data. All these scenarios are thoroughly evaluated in a variety of conditions, including a user evaluation involving professional post-editors. Chapter 6 departs from the MT problem to tackle different multimodal sequenceto-sequence tasks. More precisely, it is focused on the generation textual descriptions of videos. These techniques are also applied to the captioning of daily events, captured with an egocentric camera. Finally, the interactive-predictive framework described in Chapter 4 is applied to these multimodal systems. Chapter 7 draws the main conclusions of the thesis, describing the scientific contributions and publications derived from it and traces several lines of future research. These chapters are complemented by two appendices. Appendix A describes NMT-Keras, an open-source library developed to build neural models, that has been used to carry out most of the experiments described in the thesis. In Appendix B we provide the results of a survey carried out in the scope of Chapter 5.

show abstract

Compact Personalized Models for Neural Machine Translation

Cited by 54 publications

References 18 publications

Simple, Scalable Adaptation for Neural Machine Translation

Simple, Scalable Adaptation for Neural Machine Translation

Measuring Immediate Adaptation Performance for Neural Machine Translation

Interactivity, Adaptation and Multimodality in Neural Sequence-to-sequence Learning

Contact Info

Product

Resources

About