Grapheme-to-phoneme conversion using Long Short-Term Memory recurrent neural networks

Rao, Kanishka; Peng, Fuchun; Sak, Haşim; Beaufays, Françoise

doi:10.1109/icassp.2015.7178767

Cited by 160 publications

(131 citation statements)

References 11 publications

Supporting

Mentioning

129

Contrasting

Unclassified

Order By: Relevance

“…[campINg] X-SAMPA, extended speech assessment methods phonetic alphabet; IPA, international phonetic alphabet (Yvon et al, 1998;Béchet, 2001;de Mareüil et al, 2005;Rao et al, 2015). (Byrd & Tzoukermann, 1988;Gruaz et al, 1996).…”

Section: 한국어나 프랑스어처럼 철자를 기준으로 단어(한국어의 경 우는 어절) 내부와 단어와 단어 사이에서 많은 음운현mentioning

confidence: 99%

A knowledge-based pronunciation generation system for French

Kim¹

2018

Phonetics Speech Sci.

View full text Add to dashboard Cite

This paper aims to describe a knowledge-based pronunciation generation system for French. It has been reported that a rule-based pronunciation generation system outperforms most of the data-driven ones for French; however, only a few related studies are available due to existing language barriers. We provide basic information about the French language from the point of view of the relationship between orthography and pronunciation, and then describe our knowledge-based pronunciation generation system, which consists of morphological analysis, Part-of-Speech (POS) tagging, grapheme-to-phoneme generation, and phone-to-phone generation. The evaluation results show that the word error rate of POS tagging, based on a sample of 1,000 sentences, is 10.70% and that of phoneme generation, using 130,883 entries, is 2.70%. This study is expected to contribute to the development and evaluation of speech synthesis or speech recognition systems for French. Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0) which permits unre-stricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

show abstract

Section: 한국어나 프랑스어처럼 철자를 기준으로 단어(한국어의 경 우는 어절) 내부와 단어와 단어 사이에서 많은 음운현mentioning

confidence: 99%

A knowledge-based pronunciation generation system for French

Kim¹

2018

Phonetics Speech Sci.

View full text Add to dashboard Cite

show abstract

“…There is a lot of research on applying machine learning for grapheme to phoneme conversion (G2P), including decision tree classifier to learn pronunciation rules [3], joint ngram model [4], maximum entroy model [5], active learning [6], and most recently recurrent neural network [7]. In this paper, instead of focuing on improving machine learning G2P techniques, we strive to learn pronunciations from recognition corrections data.…”

Section: Related Workmentioning

confidence: 99%

Fix it where it fails: Pronunciation learning by mining error corrections from speech logs

Kou

Stanton

Peng

et al. 2015

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Self Cite

View full text Add to dashboard Cite

The pronunciation dictionary, or lexicon, is an essential component in an automatic speech recognition (ASR) system in that incorrect pronunciations cause systematic misrecognitions. It typically consists of a list of word-pronunciation pairs written by linguists, and a grapheme-to-phoneme (G2P) engine to generate pronunciations for words not in the list. The hand-generated list can never keep pace with the growing vocabulary of a live speech recognition system, and the G2P is usually of limited accuracy. This is especially true for proper names whose pronunciations may be influenced by various historical or foreign-origin factors. In this paper, we propose a language-independent approach to detect misrecognitions and their corrections from voice search logs. We learn previously unknown pronunciations from this data, and demonstrate that they significantly improve the quality of a production-quality speech recognition system.

show abstract

“…The classical learning paradigm in each of these settings is to train a model on pairs of strings {(x, y)} and then to evaluate model performance on test data. While there are exceptions (e.g., (Rao et al, 2015)), most state-of-the-art modelings (e.g., (Jiampojamarn et al, 2007;Bisani and Ney, 2008;Jiampojamarn et al, 2008;Novak et al, 2012)) view string transduction as a two-stage process in which string pairs (x, y) in the training data are first aligned, and then a subsequent (e.g., sequence labeling) module is learned on the aligned data. ph oe n i x f i n I ks Table 1: Sample monotone many-to-many alignment between x = phoenix and y = finIks.…”

Section: Introductionmentioning

confidence: 99%

Do we need bigram alignment models? On the effect of alignment quality on transduction accuracy in G2P

Eger

2015

Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

We investigate the need for bigram alignment models and the benefit of supervised alignment techniques in graphemeto-phoneme (G2P) conversion. Moreover, we quantitatively estimate the relationship between alignment quality and overall G2P system performance. We find that, in English, bigram alignment models do perform better than unigram alignment models on the G2P task. Moreover, we find that supervised alignment techniques may perform considerably better than their unsupervised brethren and that few manually aligned training pairs suffice for them to do so. Finally, we estimate a highly significant impact of alignment quality on overall G2P transcription performance and that this relationship is linear in nature.

show abstract

Grapheme-to-phoneme conversion using Long Short-Term Memory recurrent neural networks

Cited by 160 publications

References 11 publications

A knowledge-based pronunciation generation system for French

A knowledge-based pronunciation generation system for French

Fix it where it fails: Pronunciation learning by mining error corrections from speech logs

Do we need bigram alignment models? On the effect of alignment quality on transduction accuracy in G2P

Contact Info

Product

Resources

About