The paper introduces a method for automatic translation of Vietnamese text into Muong speech in two dialects, Muong Bi - Hoa Binh and Muong Tan Son - Phu Tho, which are all unwritten dialects of the Muong language. Due to the very close relationship between the Vietnamese and Muong languages, the translation system was built to look like a cross-lingual speech synthesis system, in which the input is the text of one language (i.e., the Vietnamese) and the output is the speech of another language (i.e., the two Muong dialects). The system used the modern sequence-to-sequence TTS neural models Tacotron2 and WaveGlow. The evaluation results showed a high quality of translation (with a fluency score of 4.61/5.0 and an adequacy score of 4.79/5.0) and also synthesized speech quality (with naturalness on the MOS scale of 4.68/5.0 and intelligibility of 94.60%). The received results show that the applicability of the proposed system to other minority languages is promising, especially in the case of unwritten languages.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.