Proceedings of the 2019 Conference of the North 2019
DOI: 10.18653/v1/n19-1312
|View full text |Cite
|
Sign up to set email alerts
|

Improving Domain Adaptation Translation with Domain Invariant and Specific Information

Abstract: In domain adaptation for neural machine translation, translation performance can benefit from separating features into domain-specific features and common features. In this paper, we propose a method to explicitly model the two kinds of information in the encoderdecoder framework so as to exploit out-ofdomain data in in-domain training. In our method, we maintain a private encoder and a private decoder for each domain which are used to model domain-specific information. In the meantime, we introduce a common e… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
23
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
5
5

Relationship

2
8

Authors

Journals

citations
Cited by 31 publications
(24 citation statements)
references
References 30 publications
0
23
0
Order By: Relevance
“…Some work also makes use of word frequency information to help learning, such as in the word segmentation (Sun et al, 2014) and term extraction (Frantzi et al, 1998;Vu et al, 2008). In NMT, word frequency information is used for curriculum learning (Kocmi and Bojar, 2017;Platanios et al, 2019) and domain adaptation data selection (Wang et al, 2017;Zhang and Xiong, 2018;Gu et al, 2019). Wang et al (2020) analyzed the miscalibration problem on the low-frequency tokens.…”
Section: Related Workmentioning
confidence: 99%
“…Some work also makes use of word frequency information to help learning, such as in the word segmentation (Sun et al, 2014) and term extraction (Frantzi et al, 1998;Vu et al, 2008). In NMT, word frequency information is used for curriculum learning (Kocmi and Bojar, 2017;Platanios et al, 2019) and domain adaptation data selection (Wang et al, 2017;Zhang and Xiong, 2018;Gu et al, 2019). Wang et al (2020) analyzed the miscalibration problem on the low-frequency tokens.…”
Section: Related Workmentioning
confidence: 99%
“…Fine-tuning (or continued learning) is a standard domain adaptation method in NMT. Given an NMT model pre-trained with a massive amount of sourcedomain parallel data, it continues the training of this pre-trained model with a small amount of target-domain parallel data (Luong and Manning, 2015;Chu et al, 2017;Bapna and Firat, 2019;Gu et al, 2019 Unsupervised domain adaptation exploits targetdomain monolingual data to train a language model to support the model's decoder in generating natural sentences in a target domain (Gülçehre et al, 2015;Domhan and Hieber, 2017). Data augmentation using back-translation (Sennrich et al, 2016a;Hu et al, 2019) is another approach to using targetdomain monolingual data.…”
Section: Related Workmentioning
confidence: 99%
“…Dakwale and Monz (2017) minimizes the cross-entropy between the output distribution of the general-domain model and the fine-tuned model. Gu et al (2019) adds a discriminator to help preserve the domain-shared features and fine tunes the whole model on the mixed training data. Jiang et al (2020) proposes to obtain the word representations by mixing their embedding in individual domains based on the domain proportions.…”
Section: Continual Trainingmentioning
confidence: 99%