In this work we investigate the impact of applying textual data augmentation tasks to low resource machine translation. There has been recent interest in investigating approaches for training systems for languages with limited resources and one popular approach is the use of data augmentation techniques. Data augmentation aims to increase the quantity of data that is available to train the system. In machine translation, majority of the language pairs around the world are considered low resource because they have little parallel data available and the quality of neural machine translation (NMT) systems depend a lot on the availability of sizable parallel corpora. We study and apply three simple data augmentation techniques popularly used in text classification tasks; synonym replacement, random insertion and contextual data augmentation and compare their performance with baseline neural machine translation for English-Swahili (En-Sw) datasets. We also present results in BLEU, ChrF and Meteor scores. Overall, the contextual data augmentation technique shows some improvements both in the EN -> SW and SW -> EN directions. We see that there is potential to use these methods in neural machine translation when more extensive experiments are done with diverse datasets.
Low-resource languages pose a particularly difficult challenge to neu-ral machine translation (NMT), and there appears to be insufficient machine translation (MT) systems to support African language accessibility. Masakhane Web, an NMT system for African languages, is proposed in this paper. Our approach is an open-source platform that is free, flexible, and produces reasonably accurate translations for African languages. The platform makes use of Masakhane community-trained MT models. It enables users to generate new data by providing feedback on translations, which is then used to retrain the models to improve them. Ultimately, our goal is to create a platform that can provide accurate translations for African languages and make the process of creating MT models easier for those who lack the technical expertise. Furthermore, we include strategies for domain experts to evaluate the system and explain how the platform can be used as a data collection source to improve MT for African languages.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.