Automatic phonetic baseform determination

Bahl, Lalit R.; Das, Subrata; deSouza, P.V.; Epstein, Mark; Mercer, R.L.; Mérialdo, Bernard; Nahamoo, D.; Picheny, Michael; Powell, James A.

doi:10.3115/116580.116641

Cited by 16 publications

(10 citation statements)

References 11 publications

(15 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Of particular interest to this work, are instances in which spoken examples are used to refine pronunciations [25]- [29]. The work of [27], for example, deduces a pronunciation given a word or grapheme sequence and an utterance of the spoken word . This research uses a decision tree to model which was later shown to produce poor results when compared to graphone models on L2S tasks.…”

Section: Related Workmentioning

confidence: 99%

Learning Lexicons From Speech Using a Pronunciation Mixture Model

McGraw

Badr

Glass

2013

IEEE Trans. Audio Speech Lang. Process.

View full text Add to dashboard Cite

Abstract-In many ways, the lexicon remains the Achilles heel of modern automatic speech recognizers. Unlike stochastic acoustic and language models that learn the values of their parameters from training data, the baseform pronunciations of words in a recognizer's lexicon are typically specified manually, and do not change, unless they are edited by an expert. Our work presents a novel generative framework that uses speech data to learn stochastic lexicons, thereby taking a step towards alleviating the need for manual intervention and automatically learning high-quality pronunciations for words. We test our model on continuous speech in a weather information domain. In our experiments, we see significant improvements over a manually specified "expert-pronunciation" lexicon. We then analyze variations of the parameter settings used to achieve these gains.Index Terms-Baseform generation, dictionary training with acoustics via EM, pronunciation learning, stochastic lexicon.

show abstract

Section: Related Workmentioning

confidence: 99%

Learning Lexicons From Speech Using a Pronunciation Mixture Model

McGraw

Badr

Glass

2013

IEEE Trans. Audio Speech Lang. Process.

View full text Add to dashboard Cite

show abstract

“…Out-of-vocabulary (OOV) words are the bottleneck in largevocabulary open-domain speech recognition systems [1] and text-to-speech systems [2]. In order to solve the problem of OOV words, Grapheme-to-phoneme (g2p) conversion, which is structured learning problems for which there are an extremely large number of candidate answers, has been used for a long time.…”

Section: Introductionmentioning

confidence: 99%

Narrow Adaptive Regularization of weights for grapheme-to-phoneme conversion

Kubo

Sakti

Neubig

et al. 2014

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

As the speech recognition field proceeds to open domain and multilingual tasks, the need for robust g2p conversion has been increasing. Towards this objective, we propose a new g2p conversion training method based on the Narrow Adaptive Regularization of Weights (NAROW) online learning algorithm. NAROW improves over its predecessor AROW by automatically adjusting hyperparameters to reduce mistake bounds, and ensuring that the learning rate is not updated when features for the input data have already been updated enough. The contribution of this paper is first to extend NAROW to structured learning, and show the inequality to bound the maximum number of errors in structured NAROW. In experiments, our proposed approach significantly improved over MIRA with consistent phoneme error rate reductions of 1.3-3.8% on a variety of dictionaries.Index Terms-g2p conversion, out-of-vocabulary word, online discriminative training, structured learning, NAROW

show abstract

“…There has been significant research on automatic lexical generation [5,6,7]. However, the novel contribution of this work is two-fold: (1) Spoken examples of both the spelling and the word are used as opposed to the word only, and (2) a bi-directional L2S model is used to exchange bias information between the spelling and pronunciation domain so as to boost the overall performance of the tandem model.…”

Section: Introductionmentioning

confidence: 99%

A turbo-style algorithm for lexical baseforms estimation

Choueiter

Ohannessian

Seneff

et al. 2008

2008 IEEE International Conference on Acoustics, Speech and Signal Processing

View full text Add to dashboard Cite

In this research, an iterative and unsupervised Turbo-style algorithm is presented and implemented for the task of automatic lexical acquisition. The algorithm makes use of spoken examples of both spellings and words and fuses information from letter and subword recognizers to boost the overall lexical learning performance. The algorithm is tested on a challenging lexicon of restaurant and street names and evaluated in terms of spelling accuracy and letter error rate. Absolute improvements of 7.2% and 3% (15.5% relative improvement) are obtained in the spelling accuracy and the letter error rate respectively following only 2 iterations of the algorithm.

show abstract

Automatic phonetic baseform determination

Cited by 16 publications

References 11 publications

Learning Lexicons From Speech Using a Pronunciation Mixture Model

Learning Lexicons From Speech Using a Pronunciation Mixture Model

Narrow Adaptive Regularization of weights for grapheme-to-phoneme conversion

A turbo-style algorithm for lexical baseforms estimation

Contact Info

Product

Resources

About