Catherine Gitau scite author profile

Catherine Gitau

2Publications

2Citation Statements Received

49Citation Statements Given

How they've been cited

How they cite others

Affiliations

African Institute for Mathematical Sciences Ghana

Publications

Order By: Most citations

Textual Augmentation Techniques Applied to Low Resource Machine Translation: Case of Swahili

Gitau

Marivate

2023

JDHASA

View full text Add to dashboard Cite

In this work we investigate the impact of applying textual data augmentation tasks to low resource machine translation. There has been recent interest in investigating approaches for training systems for languages with limited resources and one popular approach is the use of data augmentation techniques. Data augmentation aims to increase the quantity of data that is available to train the system. In machine translation, majority of the language pairs around the world are considered low resource because they have little parallel data available and the quality of neural machine translation (NMT) systems depend a lot on the availability of sizable parallel corpora. We study and apply three simple data augmentation techniques popularly used in text classification tasks; synonym replacement, random insertion and contextual data augmentation and compare their performance with baseline neural machine translation for English-Swahili (En-Sw) datasets. We also present results in BLEU, ChrF and Meteor scores. Overall, the contextual data augmentation technique shows some improvements both in the EN -> SW and SW -> EN directions. We see that there is potential to use these methods in neural machine translation when more extensive experiments are done with diverse datasets.

show abstract

Masakhane Web: A Machine Translation Platform for African Languages

Gitau

Kabongo

Modupe

et al. 2023

Preprint

View full text Add to dashboard Cite

Low-resource languages pose a particularly difficult challenge to neu-ral machine translation (NMT), and there appears to be insufficient machine translation (MT) systems to support African language accessibility. Masakhane Web, an NMT system for African languages, is proposed in this paper. Our approach is an open-source platform that is free, flexible, and produces reasonably accurate translations for African languages. The platform makes use of Masakhane community-trained MT models. It enables users to generate new data by providing feedback on translations, which is then used to retrain the models to improve them. Ultimately, our goal is to create a platform that can provide accurate translations for African languages and make the process of creating MT models easier for those who lack the technical expertise. Furthermore, we include strategies for domain experts to evaluate the system and explain how the platform can be used as a data collection source to improve MT for African languages.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Catherine Gitau

Textual Augmentation Techniques Applied to Low Resource Machine Translation: Case of Swahili

Masakhane Web: A Machine Translation Platform for African Languages

Contact Info

Product

Resources

About