2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2015
DOI: 10.1109/icassp.2015.7178767
|View full text |Cite
|
Sign up to set email alerts
|

Grapheme-to-phoneme conversion using Long Short-Term Memory recurrent neural networks

Abstract: Grapheme-to-phoneme (G2P) models are key components in speech recognition and text-to-speech systems as they describe how words are pronounced. We propose a G2P model based on a Long Short-Term Memory (LSTM) recurrent neural network (RNN). In contrast to traditional joint-sequence based G2P approaches, LSTMs have the flexibility of taking into consideration the full context of graphemes and transform the problem from a series of grapheme-to-phoneme conversions to a word-to-pronunciation conversion. Training jo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
129
0
2

Year Published

2015
2015
2019
2019

Publication Types

Select...
5
4
1

Relationship

1
9

Authors

Journals

citations
Cited by 160 publications
(131 citation statements)
references
References 11 publications
0
129
0
2
Order By: Relevance
“…[campINg] X-SAMPA, extended speech assessment methods phonetic alphabet; IPA, international phonetic alphabet (Yvon et al, 1998;Béchet, 2001;de Mareüil et al, 2005;Rao et al, 2015). (Byrd & Tzoukermann, 1988;Gruaz et al, 1996).…”
Section: 한국어나 프랑스어처럼 철자를 기준으로 단어(한국어의 경 우는 어절) 내부와 단어와 단어 사이에서 많은 음운현mentioning
confidence: 99%
“…[campINg] X-SAMPA, extended speech assessment methods phonetic alphabet; IPA, international phonetic alphabet (Yvon et al, 1998;Béchet, 2001;de Mareüil et al, 2005;Rao et al, 2015). (Byrd & Tzoukermann, 1988;Gruaz et al, 1996).…”
Section: 한국어나 프랑스어처럼 철자를 기준으로 단어(한국어의 경 우는 어절) 내부와 단어와 단어 사이에서 많은 음운현mentioning
confidence: 99%
“…There is a lot of research on applying machine learning for grapheme to phoneme conversion (G2P), including decision tree classifier to learn pronunciation rules [3], joint ngram model [4], maximum entroy model [5], active learning [6], and most recently recurrent neural network [7]. In this paper, instead of focuing on improving machine learning G2P techniques, we strive to learn pronunciations from recognition corrections data.…”
Section: Related Workmentioning
confidence: 99%
“…The classical learning paradigm in each of these settings is to train a model on pairs of strings {(x, y)} and then to evaluate model performance on test data. While there are exceptions (e.g., (Rao et al, 2015)), most state-of-the-art modelings (e.g., (Jiampojamarn et al, 2007;Bisani and Ney, 2008;Jiampojamarn et al, 2008;Novak et al, 2012)) view string transduction as a two-stage process in which string pairs (x, y) in the training data are first aligned, and then a subsequent (e.g., sequence labeling) module is learned on the aligned data. ph oe n i x f i n I ks Table 1: Sample monotone many-to-many alignment between x = phoenix and y = finIks.…”
Section: Introductionmentioning
confidence: 99%