2022
DOI: 10.48550/arxiv.2205.04686
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

AdMix: A Mixed Sample Data Augmentation Method for Neural Machine Translation

Abstract: In Neural Machine Translation (NMT), data augmentation methods such as back-translation have proven their effectiveness in improving translation performance. In this paper, we propose a novel data augmentation approach for NMT, which is independent of any additional training data. Our approach, AdMix, consists of two parts: 1) introduce faint discrete noise (word replacement, word dropping, word swapping) into the original sentence pairs to form augmented samples; 2) generate new synthetic training data by sof… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 6 publications
0
1
0
Order By: Relevance
“…Sugiyama and Yoshinaga (2019) show that back translation has a significant positive impact on context-aware large-scale NMT tasks. Several iterations of previous work (Jin et al 2022, Aji & Heafield 2020, Li & Specia 2019 show that back-translation can supplement other data augmentation techniques to improve performance in neural translation tasks. Such work emphasizes the need to better understand the impact of backtranslation in low-resource environments so that such work can keep pace with work in highresource settings.…”
Section: Back-translationmentioning
confidence: 95%
“…Sugiyama and Yoshinaga (2019) show that back translation has a significant positive impact on context-aware large-scale NMT tasks. Several iterations of previous work (Jin et al 2022, Aji & Heafield 2020, Li & Specia 2019 show that back-translation can supplement other data augmentation techniques to improve performance in neural translation tasks. Such work emphasizes the need to better understand the impact of backtranslation in low-resource environments so that such work can keep pace with work in highresource settings.…”
Section: Back-translationmentioning
confidence: 95%