Proceedings of the Third Conference on Machine Translation: Research Papers 2018
DOI: 10.18653/v1/w18-6314
|View full text |Cite
|
Sign up to set email alerts
|

Denoising Neural Machine Translation Training with Trusted Data and Online Data Selection

Abstract: Measuring domain relevance of data and identifying or selecting well-fit domain data for machine translation (MT) is a well-studied topic, but denoising is not yet. Denoising is concerned with a different type of data quality and tries to reduce the negative impact of data noise on MT training, in particular, neural MT (NMT) training. This paper generalizes methods for measuring and selecting data for domain MT and applies them to denoising NMT training. The proposed approach uses trusted data and a denoising … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
61
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
3
3
2

Relationship

1
7

Authors

Journals

citations
Cited by 57 publications
(76 citation statements)
references
References 13 publications
0
61
0
Order By: Relevance
“…1 to select data, the data distribution (domain quality) in the in-domain monolingual data used to train P (x; ϑ) is transferred into the selected data through the scoring. Data selection has also been used for data denoising (Junczys-Dowmunt, 2018;Wang et al, 2018b), by using NMT models and trusted data to measure the noise level in a sentence pair. One such a scoring function uses a baseline NMT, θ, trained on noisy data and a cleaner NMT, θ, obtained by fine-tuning θ on a small trusted parallel dataset, and measures quality in a sentence pair (x, y): φ x, y; θ, θ = log P y|x; θ −log P y|x; θ |y|…”
Section: Measuring Domain and Noise In Datamentioning
confidence: 99%
See 1 more Smart Citation
“…1 to select data, the data distribution (domain quality) in the in-domain monolingual data used to train P (x; ϑ) is transferred into the selected data through the scoring. Data selection has also been used for data denoising (Junczys-Dowmunt, 2018;Wang et al, 2018b), by using NMT models and trusted data to measure the noise level in a sentence pair. One such a scoring function uses a baseline NMT, θ, trained on noisy data and a cleaner NMT, θ, obtained by fine-tuning θ on a small trusted parallel dataset, and measures quality in a sentence pair (x, y): φ x, y; θ, θ = log P y|x; θ −log P y|x; θ |y|…”
Section: Measuring Domain and Noise In Datamentioning
confidence: 99%
“…van der Wees et al (2017) introduce a domain curriculum. Wang et al (2018b) define noise level and introduce a denoising curriculum. Kocmi and Bojar (2017) use linguistically-motivated features to classify examples into bins for scheduling.…”
Section: Curriculum Learning For Nmtmentioning
confidence: 99%
“…Also, some recent work has focused on NMT model adaptation. For instance, adapting to small datasets with desired style, desired vocabulary, or simply reduced data noise has been previously explored (e.g., Farijian et al 2017, Michel andNeubig 2018;Wang et al 2018). More recently, Tan et al (2020) have exposed NMT models to morphologically varied input English to combat bias effects which reduce the performance for users with non-native linguistic background.…”
Section: Gender Bias In Nmt Systemsmentioning
confidence: 99%
“…They experimented with several attention-based encoder-decoder models sharing the general backbone architecture described in , which comprises an encoder with two VGG-like (Simonyan and Zisserman, 2015) CNN blocks followed by five stacked BLSTM layers. All the systems were developed using the ESPnet end-to-end speech processing toolkit (Watanabe et al, 2018). An ASR model trained on Kaldi was used to process the unsegmented test set, training the acoustic model on the TED-LIUM 3 corpus.…”
Section: Submissionsmentioning
confidence: 99%