This work explores neural machine translation between Myanmar (Burmese) and Rakhine (Arakanese). Rakhine is a language closely related to Myanmar, often considered a dialect. We implemented three prominent neural machine translation (NMT) systems: recurrent neural networks (RNN), transformer, and convolutional neural networks (CNN). The systems were evaluated on a Myanmar-Rakhine parallel text corpus developed by us. In addition, two types of word segmentation schemes for word embeddings were studied: Word-BPE and Syllable-BPE segmentation. Our experimental results clearly show that the highest quality NMT and statistical machine translation (SMT) performances are obtained with Syllable-BPE segmentation for both types of translations. If we focus on NMT, we find that the transformer with Word-BPE segmentation outperforms CNN and RNN for both Myanmar-Rakhine and Rakhine-Myanmar translation.However, CNN with Syllable-BPE segmentation obtains a higher score than the RNN and transformer.
In this paper we describe our submissions to WAT-2021(Nakazawa et al., 2021 for English-to-Myanmar language (Burmese) task. Our team, ID: "YCC-MT1", focused on bringing transliteration knowledge to the decoder without changing the model. We manually extracted the transliteration word/phrase pairs from the ALT corpus and applying XML markup feature of Moses decoder (i.e. -xml-input exclusive, -xml-input inclusive). We demonstrate that hybrid translation technique can significantly improve (around 6 BLEU scores) the baseline of three well-known "Phrase-based SMT", "Operation Sequence Model" and "Hierarchical Phrase-based SMT". Moreover, this simple hybrid method achieved the second highest results among the submitted MT systems for English-to-Myanmar WAT2021 translation share task according to BLEU (Papineni et al., 2002) and AMFM scores (Banchs et al., 2015).
Parallel corpora for the languages of Myanmar (Beik, Burmese, Rakhine) are extremely scarce but a necessary requirement for machine translation R&D. Previous studies have proved that pivoting leads to better translation quality if the bridge language is closely related to the source and target language pair. The baseline study is conducted based on the three major approaches of machine translation; Weighted Finite State Transducer (WFST), Phrase-Based Statistical Machine Translation (PBSMT) and Deep Recurrent Neural Network (Deep-RNN). Based on the baseline results, this paper mainly investigated the pivot language technique for PBSMT with Burmese dialects. We employed two different pivot translation methods: transfer (sentence level) and triangulation (phrase level). We present the experimental results on Dawei-Beik, Beik-Dawei translations and Beik-Rakhine, Rakhine-Beik translation via Burmese. Both the transfer and triangulation approaches outperformed the baseline (direct translation), specifically for the Rakhine-Beik language pair. Moreover, the results of the average BiLingual Evaluation Understudy (BLEU), Character ngram F-score (chrF), and Word Error Rate (WER) scores of the 10-fold cross-validation experiments proved that the triangulation pivot has significantly better acceleration than the transfer pivot. We plan to release the parallel corpora of Burmese dialects and present several avenues for further research.INDEX TERMS Burmese dialects, pivot translation, transfer, triangulation, machine translation. '' (''here'') and '' '' (''there'') in the Dawei language. For instance, Dawei word '' '' is same as '' '' in Burmese language and '' '' means '' '' in Burmese language. The question words '' ( ), ( )'' are used in Burmese language; similarly, '' , '' are used instead of '' ( )'' in Dawei language. Moreover, '' '' (''what'') and '' '' (''what happened'') is similar to '' '' and '' '' in Dawei usage. In negative sense of Burmese word '' '' is not used in Dawei word. The negative Dawei words are '' ( )'' or '' '' (''No'' in English). Burmese adverb word '' , , '' (very, extremely) is used as '' , , ''. Some more examples of Dawei vocabularies include '' '', '' '' in Burmese language, (''pregnant'' in English), '' '', '' '' in Burmese language, (''boy'' in English), '' '', '' '' in Burmese language, (''girl'' in English), '' '' '' '' in Burmese language, (''money'' in English), '' '' '' '' in Burmese
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.