Cross-Lingual Machine Speech Chain for Javanese, Sundanese, Balinese, and Bataks Speech Recognition and Synthesis

Novitasari, Sashi; Tjandra, Andros; Sakti, Sakriani; Nakamura, Satoshi

doi:10.48550/arxiv.2011.02128

Cited by 3 publications

(2 citation statements)

References 12 publications

(15 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Therefore, it can be said that when ASR is evaluated from WER, its error rate is higher than one thst evaluated with CER. In previous research (Novitasari et al, 2020), the evaluation used to evaluate the ASR model is CER. This method is used because the language contains some characters outside the standard alphabet.…”

Section: Discussionmentioning

confidence: 99%

“…Research by (Rouditchenko et al, 2023) stated a comparison of performance between the XSL-R and Whisper model in zero-shot conditions ( without fine-tuning) where the evaluation of model performance is lower in less seen or unseen language, which can categorized as low-resource language. There is research (Novitasari et al, 2020) to build an ASR model for ethnic languages in Indonesia. One of the best results is evaluating ASR model performance in recognizing speech in the Javanese language, which is 20.20% in CER evaluation.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Analysis of Whisper Automatic Speech Recognition Performance on Low Resource Language

Pratama,

Amrullah

2024

pilar

View full text Add to dashboard Cite

Implementing Automatic Speech Recognition Technology in daily life could give convenience to its users. However, speeches that can be recognized accurately by the ASR model right now are in languages considered high resources, like English. In previous research, a few regional languages like Javanese, Sundanese, Balinese and Btaknese are used in automatic speech recognition. This research aim is to improve speech recognition using the ASR model on low-resource language. The dataset used in this research is the Javanese dataset specifically because there is a high-quality Javanese speech dataset provided by previous research. The method used is fine-tuning the Whisper model which has been trained on 680,000 hours of multilingual voice data using a Javanese speech dataset. To reduce computation requirements, parameter efficient fine-tuning (PEFT) implemented in the fine-tuning process. The trainable parameter is reduced to <1% because the implementation of PEFT reduces the computation required by the model for fine-tuning. The best WER evaluation result is 13.77%, achieved by the fine-tuned Whisper large-v2 model compared to the base model of Whisper large-v2, which achieves 89.40% in WER evaluation. Performance improvement in WER evaluation showed that fine-tuning effectively improves the performance of the Whisper automatic speech recognition model on recognizing speeches in low-resource languages like the Javanese language compared to the Original Whisper model performance with minimal computational cost needed for fine-tuning large model.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Analysis of Whisper Automatic Speech Recognition Performance on Low Resource Language

Pratama,

Amrullah

2024

pilar

View full text Add to dashboard Cite

show abstract

IndoNLG: Benchmark and Resources for Evaluating Indonesian Natural Language Generation

Cahyawijaya¹,

Winata²,

Wilie³

et al. 2021

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

Natural language generation (NLG) benchmarks provide an important avenue to measure progress and develop better NLG systems. Unfortunately, the lack of publicly available NLG benchmarks for low-resource languages poses a challenging barrier for building NLG systems that work well for languages with limited amounts of data. Here we introduce IndoNLG, the first benchmark to measure natural language generation (NLG) progress in three low-resource-yet widely spokenlanguages of Indonesia: Indonesian, Javanese, and Sundanese. Altogether, these languages are spoken by more than 100 million native speakers, and hence constitute an important use case of NLG systems today. Concretely, IndoNLG covers six tasks: summarization, question answering, chit-chat, and three different pairs of machine translation (MT) tasks. We collate a clean pretraining corpus of Indonesian, Sundanese, and Javanese datasets, Indo4B-Plus, which is used to pretrain our models: IndoBART and IndoGPT. We show that IndoBART and IndoGPT achieve competitive performance on all tasks-despite using only one-fifth the parameters of a larger multilingual model, mBART LARGE . This finding emphasizes the importance of pretraining on closely related, local languages to achieve more efficient learning and faster inference for very low-resource languages like Javanese and Sundanese. 1 * These authors contributed equally. 1 Beyond the clean pretraining data, we publicly release all pretrained models and tasks at https://github.com/ indobenchmark/indonlg to facilitate NLG research in these languages.

show abstract

Towards Computational Linguistics in Minangkabau Language: Studies on Sentiment Analysis and Machine Translation

Koto¹,

Koto²

2020

Preprint

View full text Add to dashboard Cite

Although some linguists (Rusmali et al., 1985;Crouch, 2009) have fairly attempted to define the morphology and syntax of Minangkabau, information processing in this language is still absent due to the scarcity of the annotated resource. In this work, we release two Minangkabau corpora: sentiment analysis and machine translation that are harvested and constructed from Twitter and Wikipedia. 1 We conduct the first computational linguistics in Minangkabau language employing classic machine learning and sequence-to-sequence models such as LSTM and Transformer. Our first experiments show that the classification performance over Minangkabau text significantly drops when tested with the model trained in Indonesian. Whereas, in the machine translation experiment, a simple wordto-word translation using a bilingual dictionary outperforms LSTM and Transformer model in terms of BLEU score.

show abstract

Cross-Lingual Machine Speech Chain for Javanese, Sundanese, Balinese, and Bataks Speech Recognition and Synthesis

Cited by 3 publications

References 12 publications

Analysis of Whisper Automatic Speech Recognition Performance on Low Resource Language

Analysis of Whisper Automatic Speech Recognition Performance on Low Resource Language

IndoNLG: Benchmark and Resources for Evaluating Indonesian Natural Language Generation

Towards Computational Linguistics in Minangkabau Language: Studies on Sentiment Analysis and Machine Translation

Contact Info

Product

Resources

About