Applications of Lexicographic Semirings to Problems in Speech and Language Processing

Sproat, Richard; Yarmohammadi, Mahsa; Shafran, Izhak; Roark, Brian

doi:10.1162/coli_a_00198

Cited by 7 publications

(6 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Russian, Spanish, Italian and Portuguese saw error rate reductions in the 50-60% range. While accuracy in Russian is slightly lower than Sproat et al [16] showed, our approach does not require any language-specific feature engineering.…”

Section: Grapheme Features In Stress Predictionmentioning

confidence: 69%

“…Sproat et al [16] show that models can be trained to predict stress placement given spelling, following Dou et al [17] who report numbers on both stand-alone stress prediction as well as joint phoneme and stress prediction accuracy. Dou et al [17] report numbers over four European languages and use the CELEX2 [18] lexicon.…”

Section: Grapheme-to-phoneme Predictionmentioning

confidence: 99%

See 1 more Smart Citation

Predicting Pronunciations with Syllabification and Stress with Recurrent Neural Networks

Esch

Chua²,

Rao

2016

Interspeech 2016

View full text Add to dashboard Cite

Word pronunciations, consisting of phoneme sequences and the associated syllabification and stress patterns, are vital for both speech recognition and text-to-speech (TTS) systems. For speech recognition phoneme sequences for words may be learned from audio data. We train recurrent neural network (RNN) based models to predict the syllabification and stress pattern for such pronunciations making them usable for TTS. We find these RNN models significantly outperform naive rulebased models for almost all languages we tested. Further, we find additional improvements to the stress prediction model by using the spelling as features in addition to the phoneme sequence. Finally, we train a single RNN model to predict the phoneme sequence, syllabification and stress for a given word. For several languages, this single RNN outperforms similar models trained specifically for either phoneme sequence or stress prediction. We report an exhaustive comparison of these approaches for twenty languages.

show abstract

Section: Grapheme Features In Stress Predictionmentioning

confidence: 69%

Section: Grapheme-to-phoneme Predictionmentioning

confidence: 99%

Predicting Pronunciations with Syllabification and Stress with Recurrent Neural Networks

Esch

Chua²,

Rao

2016

Interspeech 2016

View full text Add to dashboard Cite

show abstract

“…As a consequence, much of the subsequent work on applying machine learning to text normalization for speech applications focuses on specific semiotic classes, like letter sequences(Sproat and Hall 2014), abbreviations(Roark and Sproat 2014), or cardinal numbers(Gorman and Sproat 2016). 4 In fact, Kestrel(Ebden and Sproat 2014) uses a machine-learned morphosyntactic tagger for Russian.…”

mentioning

confidence: 99%

Neural Models of Text Normalization for Speech Applications

Zhang

Sproat

et al. 2019

Computational Linguistics

Self Cite

View full text Add to dashboard Cite

Machine learning, including neural network techniques, have been applied to virtually every domain in natural language processing. One problem that has been somewhat resistant to effective machine learning solutions is text normalization for speech applications such as text-to-speech synthesis (TTS). In this application, one must decide, for example, that 123 is verbalized as one hundred twenty three in 123 pages but as one twenty three in 123 King Ave. For this task, state-of-the-art industrial systems depend heavily on hand-written language-specific grammars. We propose neural network models that treat text normalization for TTS as a sequence-to-sequence problem, in which the input is a text token in context, and the output is the verbalization of that token. We find that the most effective model, in accuracy and efficiency, is one where the sentential context is computed once and the results of that computation are combined with the computation of each token in sequence to compute the verbalization. This model allows for a great deal of flexibility in terms of representing the context, and also allows us to integrate tagging and segmentation into the process. These models perform very well overall, but occasionally they will predict wildly inappropriate verbalizations, such as reading 3 cm as three kilometers. Although rare, such verbalizations are a major issue for TTS applications. We thus use finite-state covering grammars to guide the neural models, either during training and decoding, or just during decoding, away from such “unrecoverable” errors. Such grammars can largely be learned from data.

show abstract

“…Pruning could still be done for acronym pronunciation variants. A classification algorithm based on Maximum Entropy-based rankers has previously been implemented to determine whether acronyms are pronounced as a word, letter sequence, or a mix of both [23].…”

Section: Conclusion and Discussionmentioning

confidence: 99%

Adaptation of Morph-Based Speech Recognition for Foreign Names and Acronyms

Mansikkaniemi

Kurimo

2015

IEEE/ACM Trans. Audio Speech Lang. Process.

View full text Add to dashboard Cite

In this paper, we improve morph-based speech recognition system by focusing adaptation efforts on acronyms (ACRs) and foreign proper names (FPNs). An unsupervised language model (LM) adaptation framework based on two-pass decoding is used. Vocabulary adaptation is applied alongside unsupervised LM adaptation. The aim is to improve both language and pronunciation modeling for FPNs and ACRs. A smart selection algorithm is used to find the most likely topically related foreign words and acronyms from in-domain text. New pronunciation rules are generated for the selected words. Different kinds of morpheme adaptation operations are also evaluated on the ACR and FPN candidate words, to ensure optimal results are gained from pronunciation adaptation. Statistically significant improvements in average word error rate (WER), and term error rate (TER), are achieved using a combination of unsupervised LM adaptation with vocabulary adaptation focused on ACRs and FPNs.Index Terms-Foreign word detection, morph-based speech recognition, out-of-vocabulary (OOV) recognition, unsupervised language model (LM) adaptation.

show abstract

Applications of Lexicographic Semirings to Problems in Speech and Language Processing

Cited by 7 publications

References 24 publications

Predicting Pronunciations with Syllabification and Stress with Recurrent Neural Networks

Predicting Pronunciations with Syllabification and Stress with Recurrent Neural Networks

Neural Models of Text Normalization for Speech Applications

Adaptation of Morph-Based Speech Recognition for Foreign Names and Acronyms

Contact Info

Product

Resources

About