Proceedings of the 12th International Conference on Natural Language Generation 2019
DOI: 10.18653/v1/w19-8629
|View full text |Cite
|
Sign up to set email alerts
|

Selecting Artificially-Generated Sentences for Fine-Tuning Neural Machine Translation

Abstract: Neural Machine Translation (NMT) models tend to achieve best performance when larger sets of parallel sentences are provided for training. For this reason, augmenting the training set with artificially-generated sentence pairs can boost performance.Nonetheless, the performance can also be improved with a small number of sentences if they are in the same domain as the test set. Accordingly, we want to explore the use of artificially-generated sentences along with data-selection algorithms to improve Germanto-En… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
3

Relationship

3
5

Authors

Journals

citations
Cited by 10 publications
(7 citation statements)
references
References 17 publications
(14 reference statements)
0
6
0
Order By: Relevance
“…Finally, we are interested in further exploring the algorithms explained in this work using NMT, using different configurations or artificial datasets (Poncelas, de Buy Wenniger, and Way 2019a;Poncelas and Way 2019;Soto et al 2020). Even if NMT systems work better with big amounts of data, data selection algorithms are useful to perform the so called "fine-tuning" (Luong and Manning 2015;Freitag and Al-Onaizan 2016), where pre-built system are improved with a small portion of in-domain data.…”
Section: Discussionmentioning
confidence: 99%
“…Finally, we are interested in further exploring the algorithms explained in this work using NMT, using different configurations or artificial datasets (Poncelas, de Buy Wenniger, and Way 2019a;Poncelas and Way 2019;Soto et al 2020). Even if NMT systems work better with big amounts of data, data selection algorithms are useful to perform the so called "fine-tuning" (Luong and Manning 2015;Freitag and Al-Onaizan 2016), where pre-built system are improved with a small portion of in-domain data.…”
Section: Discussionmentioning
confidence: 99%
“…These target sentences are then back-translated. The syntheticsource sentence pairs are typically used directly for fine-tuning the model, but can also be used as candidates for a domain-specific data selection scheme (Poncelas & Way, 2019).…”
Section: Back Translationmentioning
confidence: 99%
“…We trained the system with GSW_NORM-DE and specialised it with GSW_NORM-DE_PE (as suggested in Sennrich and Zhang, 2019). The purpose of this approach is to use a larger corpus with low quality segments for training to increase vocabulary coverage (Poncelas and Way, 2019) and then to specialise with high quality segments to eliminate noise.…”
Section: Systemsmentioning
confidence: 99%