Denoising Neural Machine Translation Training with Trusted Data and Online Data Selection

Wang, Wei; Watanabe, Takahiro; Hughes, Macduff; Nakagawa, Tatsunori; Chelba, Ciprian

doi:10.18653/v1/w18-6314

Cited by 57 publications

(76 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…1 to select data, the data distribution (domain quality) in the in-domain monolingual data used to train P (x; ϑ) is transferred into the selected data through the scoring. Data selection has also been used for data denoising (Junczys-Dowmunt, 2018;Wang et al, 2018b), by using NMT models and trusted data to measure the noise level in a sentence pair. One such a scoring function uses a baseline NMT, θ, trained on noisy data and a cleaner NMT, θ, obtained by fine-tuning θ on a small trusted parallel dataset, and measures quality in a sentence pair (x, y): φ x, y; θ, θ = log P y|x; θ −log P y|x; θ |y|…”

Section: Measuring Domain and Noise In Datamentioning

confidence: 99%

See 1 more Smart Citation

Dynamically Composing Domain-Data Selection with Clean-Data Selection by “Co-Curricular Learning” for Neural Machine Translation

Wang¹,

Caswell²,

Chelba³

2019

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Self Cite

View full text Add to dashboard Cite

Noise and domain are important aspects of data quality for neural machine translation. Existing research focus separately on domaindata selection, clean-data selection, or their static combination, leaving the dynamic interaction across them not explicitly examined. This paper introduces a "co-curricular learning" method to compose dynamic domain-data selection with dynamic clean-data selection, for transfer learning across both capabilities. We apply an EM-style optimization procedure to further refine the "co-curriculum". Experiment results and analysis with two domains demonstrate the effectiveness of the method and the properties of data scheduled by the cocurriculum.

show abstract

Section: Measuring Domain and Noise In Datamentioning

confidence: 99%

“…van der Wees et al (2017) introduce a domain curriculum. Wang et al (2018b) define noise level and introduce a denoising curriculum. Kocmi and Bojar (2017) use linguistically-motivated features to classify examples into bins for scheduling.…”

Section: Curriculum Learning For Nmtmentioning

confidence: 99%

Dynamically Composing Domain-Data Selection with Clean-Data Selection by “Co-Curricular Learning” for Neural Machine Translation

Wang¹,

Caswell²,

Chelba³

2019

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Self Cite

View full text Add to dashboard Cite

show abstract

“…Also, some recent work has focused on NMT model adaptation. For instance, adapting to small datasets with desired style, desired vocabulary, or simply reduced data noise has been previously explored (e.g., Farijian et al 2017, Michel andNeubig 2018;Wang et al 2018). More recently, Tan et al (2020) have exposed NMT models to morphologically varied input English to combat bias effects which reduce the performance for users with non-native linguistic background.…”

Section: Gender Bias In Nmt Systemsmentioning

confidence: 99%

The practical ethics of bias reduction in machine translation: why domain adaptation is better than data debiasing

et al. 2021

View full text Add to dashboard Cite

This article probes the practical ethical implications of AI system design by reconsidering the important topic of bias in the datasets used to train autonomous intelligent systems. The discussion draws on recent work concerning behaviour-guiding technologies, and it adopts a cautious form of technological utopianism by assuming it is potentially beneficial for society at large if AI systems are designed to be comparatively free from the biases that characterise human behaviour. However, the argument presented here critiques the common well-intentioned requirement that, in order to achieve this, all such datasets must be debiased prior to training. By focusing specifically on gender-bias in Neural Machine Translation (NMT) systems, three automated strategies for the removal of bias are considered – downsampling, upsampling, and counterfactual augmentation – and it is shown that systems trained on datasets debiased using these approaches all achieve general translation performance that is much worse than a baseline system. In addition, most of them also achieve worse performance in relation to metrics that quantify the degree of gender bias in the system outputs. By contrast, it is shown that the technique of domain adaptation can be effectively deployed to debias existing NMT systems after they have been fully trained. This enables them to produce translations that are quantitatively far less biased when analysed using gender-based metrics, but which also achieve state-of-the-art general performance. It is hoped that the discussion presented here will reinvigorate ongoing debates about how and why bias can be most effectively reduced in state-of-the-art AI systems.

show abstract

“…They experimented with several attention-based encoder-decoder models sharing the general backbone architecture described in , which comprises an encoder with two VGG-like (Simonyan and Zisserman, 2015) CNN blocks followed by five stacked BLSTM layers. All the systems were developed using the ESPnet end-to-end speech processing toolkit (Watanabe et al, 2018). An ASR model trained on Kaldi was used to process the unsegmented test set, training the acoustic model on the TED-LIUM 3 corpus.…”

Section: Submissionsmentioning

confidence: 99%

Proceedings of the 17th International Conference on Spoken Language Translation

Federico¹,

Waibel²,

Knight³

et al. 2020

View full text Add to dashboard Cite

The conference chairs and organizers would like to express their gratitude to everyone who contributed and supported IWSLT. Our IWSLT-20 program exceeds all our expectations in quality and breath, particularly when considering the challenges during a pandemic under lock-downs and health and travel restrictions. We thank the challenge track chairs, organizers, and participants, the program chairs and committee members, as well as all the authors that went the extra mile to submit system and research papers to IWSLT, and make this year's conference our most vibrant than ever. We also wish to express our sincere gratitude to ACL for hosting our conference and for arranging the logistics and infrastructure that allow us to hold IWSLT 2020 as a virtual online conference.

show abstract

Denoising Neural Machine Translation Training with Trusted Data and Online Data Selection

Cited by 57 publications

References 13 publications

Dynamically Composing Domain-Data Selection with Clean-Data Selection by “Co-Curricular Learning” for Neural Machine Translation

Dynamically Composing Domain-Data Selection with Clean-Data Selection by “Co-Curricular Learning” for Neural Machine Translation

The practical ethics of bias reduction in machine translation: why domain adaptation is better than data debiasing

Proceedings of the 17th International Conference on Spoken Language Translation

Contact Info

Product

Resources

About