Data Augmentation with Unsupervised Machine Translation Improves the Structural Similarity of Cross-lingual Word Embeddings

Nishikawa, Sosuke; Ri, Ryokan; Tsuruoka, Yoshimasa

doi:10.18653/v1/2021.acl-srw.17

Cited by 2 publications

(3 citation statements)

References 23 publications

(15 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Paraphrasing Thesauruses Zhang et al [5], Wei et al [6], Coulombe et al [7] Semantic Embeddings Wang et al [8] MLMs Jiao et al [9] Rules Coulombe et al [7], Regina et al [10], Louvan et al [11] Machine Translation Back-translation Xie et al [12], Zhang et al [13] Unidirectional Translation Nishikawa et al [14], Bornea et al [15] Model Generation Hou et al [16], Li et al [17], Liu et al [18] Noising Swapping Wei et al [6], Luque et al [19], Yan et al [20] Deletion Wei et al [6], Peng et al [21], Yu et al [22] Insertion Wei et al [6], Peng et al [21], Yan et al [20] Substitution Coulombe et al [7], Xie et al [23], Louvan et al [11] Mixup Guo et al [24], Cheng et al [25] Sampling Rules Min et al [26], Liu et al [27] Seq2Seq Models Kang et al [28], Zhang et al [13], Raille et al [29] Language Models…”

Section: Da For Nlpmentioning

confidence: 99%

“…In the task of unsupervised cross-lingual word embeddings (CLWEs), Nishikawa et al [14] build pseudo-parallel corpus with an unsupervised machine translation model. The authors first train unsupervised machine translation (UMT) models using the source/target training corpora and then translate the corpora using the UMT models.…”

Section: Machine Translationmentioning

confidence: 99%

“…That is, for a piece of text input, output the category to which the text belongs, where the category is a pre-defined closed set. 14 • Text generation, as the name implies, is to generate the corresponding text given the input data. The most classic example is machine translation.…”

Section: Applications On Nlp Tasksmentioning

confidence: 99%

See 2 more Smart Citations

Data Augmentation Approaches in Natural Language Processing: A Survey

Li,

Hou,

Che

2021

Preprint

View full text Add to dashboard Cite

As an effective strategy, data augmentation (DA) alleviates data scarcity scenarios where deep learning techniques may fail. It is widely applied in computer vision then introduced to natural language processing and achieves improvements in many tasks. One of the main focuses of the DA methods is to improve the diversity of training data, thereby helping the model to better generalize to unseen testing data. In this survey, we frame DA methods into three categories based on the diversity of augmented data, including paraphrasing, noising, and sampling. Our paper sets out to analyze DA methods in detail according to the above categories. Further, we also introduce their applications in NLP tasks as well as the challenges.

show abstract