Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics 2019
DOI: 10.18653/v1/p19-1579
|View full text |Cite
|
Sign up to set email alerts
|

Generalized Data Augmentation for Low-Resource Translation

Abstract: Translation to or from low-resource languages (LRLs) poses challenges for machine translation in terms of both adequacy and fluency. Data augmentation utilizing large amounts of monolingual data is regarded as an effective way to alleviate these problems. In this paper, we propose a general framework for data augmentation in low-resource machine translation that not only uses target-side monolingual data, but also pivots through a related highresource language (HRL). Specifically, we experiment with a two-step… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
47
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
4
2

Relationship

1
9

Authors

Journals

citations
Cited by 86 publications
(50 citation statements)
references
References 30 publications
0
47
0
Order By: Relevance
“…The biggest disadvantage of these methods is not reserving meaning concerning the context of the sentences, so we present more complex approaches retaining the meaning as the original sentence. Back translation aims to obtain more training samples based on the translators, many research teams have used to improve translation models [12][13][14][15]23]. This technique is resolved by using the translators to translate the original data to a certain language, after that taking the translated data into the independent translator to translate back to the original language.…”
Section: Data Augmentationmentioning
confidence: 99%
“…The biggest disadvantage of these methods is not reserving meaning concerning the context of the sentences, so we present more complex approaches retaining the meaning as the original sentence. Back translation aims to obtain more training samples based on the translators, many research teams have used to improve translation models [12][13][14][15]23]. This technique is resolved by using the translators to translate the original data to a certain language, after that taking the translated data into the independent translator to translate back to the original language.…”
Section: Data Augmentationmentioning
confidence: 99%
“…Our approach bears similarities to pseudo-corpus approaches that have been used in machine translation (MT), where low-resource language data are augmented with data generated from a related highresource language. Among many, for instance, De Gispert and Marino (2006) built a Catalan-English MT by bridging through Spanish, while Xia et al (2019) show that word-level substitutions can convert a high-resource (related) language corpus into a pseudo low-resource one leading to large improvements in MT quality. Such approaches typically operate at the word level, hence they do not need to handle script differences explicitly.…”
Section: Introductionmentioning
confidence: 99%
“…Recently, there has been growing interest in low-resource NLP, with work in part-of-speech tagging (Plank and Agić, 2018), parsing (Rasooli and Collins, 2017), machine translation (Xia et al, 2019), and other fields. Low-resource NER has seen work using Wikipedia (Tsai et al, 2016), self attention (Xie et al, 2018), and multilingual contextual representations (Wu and Dredze, 2019).…”
Section: Related Workmentioning
confidence: 99%