CUNI Neural ASR with Phoneme-Level Intermediate Step for~Non-Native~SLT at IWSLT 2020

Polák, Peter; Sagar, Sangeet; Macháček, Dominik; Bojar, Ondřej

doi:10.18653/v1/2020.iwslt-1.24

Cited by 4 publications

(3 citation statements)

References 21 publications

(12 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The Uzbek language is limited in terms of data set, since it could not develop naturally for a long time, but the use of Bayesian models can be useful in terms of building rhythmic and intonational parameters. Polá k et al [39] develop a speech recognition pipeline consisting of an acoustic model and a phoneme-grapheme model. Such a system is superior to automatic speech recognition.…”

Section: Using Different Technologies For Phonemic Speech Recognition...mentioning

confidence: 99%

Creation of An Intelligent System for Uzbek Language Teaching Using Phoneme-Based Speech Recognition

Ibragimova

2023

RIA

View full text Add to dashboard Cite

The recent surge in interest to learn the Uzbek language among foreigners has underscored the need for innovative teaching tools. Despite the limited studies on intelligent systems for phonemic speech recognition in the Uzbek context, this research aimed to address this gap. The purpose of this study was to create an intelligent system for teaching the Uzbek language as a foreign language based on the technology of phonemic recognition of speech signals. It was developed an intelligent system for Uzbek language instruction using phonemic speech recognition technology. The approach utilized various methods, including pinpointing challenging phonemes, comparative data analyses, and analytical-synthetic breakdowns of linguistic components, all enhanced by the wavelet transform's signal refinement. The system's precision in recognizing speech signals phoneme-by-phoneme, emphasizing difficult sounds for learners, promises broader AI-driven language study applications. Specifically designed for the Uzbek language, the system achieves an accuracy range of 67% to 95%. This breakthrough not only propels AI-driven language processing but offers a robust tool for improving Uzbek language instruction, especially beneficial for the Turkic language group. Future avenues include its use in computer modeling and automatic speech processing for Turkic languages, solidifying its innovative contribution to AI-driven language teaching.

show abstract

Section: Using Different Technologies For Phonemic Speech Recognition...mentioning

confidence: 99%

Creation of An Intelligent System for Uzbek Language Teaching Using Phoneme-Based Speech Recognition

Ibragimova

2023

RIA

View full text Add to dashboard Cite

show abstract

“…However, these studies were primarily designed for a monolingual setup, and their main goal was to perform spelling correction rather than involving P2G translation. In the field of two-pass ASR with P2G translation, a notable study by [15] focuses on utilizing phonemes as an intermediate representation. They introduce a comprehensive two-pass ASR system incorporating phoneme recognition and P2G translation stages.…”

Section: Two-pass Automatic Speech Recognitionmentioning

confidence: 99%

“…Additionally, P2G translation can be further enhanced through training with noisy text data, enabling robust performance in noisy ASR hypotheses. Previous studies such as [11,12,15] have employed the K-fold method to generate ASR noise for training the translation model. Another approach, as seen in [14], involves generating synthetic audio and applying ASR inference to produce noisy data for a translator.…”

Section: Two-pass Automatic Speech Recognitionmentioning

confidence: 99%

Development of Higher Education in the Republic of Korea

Lee¹,

Kwak²,

Kim³

et al. 2022

Education in South Korea

View full text Add to dashboard Cite

This research optimizes two-pass cross-lingual transfer learning in low-resource languages by enhancing phoneme recognition and phoneme-to-grapheme translation models. Our approach optimizes these two stages to improve speech recognition across languages. We optimize phoneme vocabulary coverage by merging phonemes based on shared articulatory characteristics, thus improving recognition accuracy. Additionally, we introduce a global phoneme noise generator for realistic ASR noise during phoneme-to-grapheme training to reduce error propagation. Experiments on the Common-Voice 12.0 dataset show significant reductions in Word Error Rate (WER) for low-resource languages, highlighting the effectiveness of our approach. This research contributes to the advancements of two-pass ASR systems in low-resource languages, offering the potential for improved cross-lingual transfer learning.

show abstract

Transformer-Based Joint Learning Approach for Text Normalization in Vietnamese Automatic Speech Recognition Systems

Bui

Luong

Oanh

2022

Cybernetics and Systems

View full text Add to dashboard Cite

CUNI Neural ASR with Phoneme-Level Intermediate Step for~Non-Native~SLT at IWSLT 2020

Cited by 4 publications

References 21 publications

Creation of An Intelligent System for Uzbek Language Teaching Using Phoneme-Based Speech Recognition

Creation of An Intelligent System for Uzbek Language Teaching Using Phoneme-Based Speech Recognition

Development of Higher Education in the Republic of Korea

Transformer-Based Joint Learning Approach for Text Normalization in Vietnamese Automatic Speech Recognition Systems

Contact Info

Product

Resources

About