The Kestrel TTS text normalization system

Ebden, Peter; Sproat, Richard

doi:10.1017/s1351324914000175

Cited by 63 publications

(48 citation statements)

References 30 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…enough to suggest that such methods may prove useful when extended to all input classes. For our experimental data we used the training and test data for the 2017 Kaggle competition on text normalization [11], which consisted of ten million tokens of English Wikipedia text verbalized using the Kestrel text normalization system [2] for training data, and one million for testing. As described below, for some experiments we trained and tested on just measure and money expressions.…”

Section: Methodsmentioning

confidence: 99%

“…For example, Train 540 leaves at 4:45 might be read as train five forty leaves at four forty five, in American English. One traditional approach to this problem involves hand-built finite-state grammars [2], but more recently there has been interest in neural approaches to the problem [3,4,5,6]. While neural methods perform well overall, they have a tendency on occasion to produce highly misleading output, such as reading 2mA as two million liters [3].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Dual Encoder Classifier Models as Constraints in Neural Text Normalization

Gokcen¹,

Zhang

Sproat

2019

Interspeech 2019

Self Cite

View full text Add to dashboard Cite

Neural text normalization systems can achieve low error rates; however, the errors they make include not only ones from which the hearer can recover (such as reading $3 as three dollar) but also unrecoverable errors, such as reading $3 as three euros. FST decoding constraints have proven effective at reducing unrecoverable errors. In this paper we explore an alternative approach to error mitigation: using dual encoder classifiers trained with both positive and negative examples to implement soft constraints on acceptability. Since the error rates are very low, it is difficult to determine when improvement is significant, but qualitative analysis suggests that soft dual encoder constraints can help reduce the number of unrecoverable errors.

show abstract

Section: Methodsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Dual Encoder Classifier Models as Constraints in Neural Text Normalization

Gokcen¹,

Zhang

Sproat

2019

Interspeech 2019

Self Cite

View full text Add to dashboard Cite

show abstract

“…But these kinds of "silly" errors are errors that neural models trained on these sorts of data will make, as we demonstrate below. In contrast, a well-constructed hand-built system such as [3] might be brittle and fail for many classes of cases, but it would not produce errors of the kind we have just described. So one can have high overall accuracy, but a few bad errors of the kind just described make the system unusable.…”

mentioning

confidence: 94%

An RNN Model of Text Normalization

Jaitly

Sproat

2017

Interspeech 2017

Self Cite

View full text Add to dashboard Cite

We present a recurrent neural net (RNN) model of text normalization-defined as the mapping of written text to its spoken form, and a description of the open-source dataset that we used in our experiments. We show that while the RNN model achieves very high overall accuracies, there remain errors that would be unacceptable in a speech application like TTS. We then show that a simple FST-based filter can help mitigate those errors. Even with that mitigation challenges remain, and we end the paper outlining some possible solutions. In releasing our data we are thereby inviting others to help solve this problem.

show abstract

“…The front-end of TTS aims to extract various linguistic and phonetic features from the raw text, in order to improve the naturalness and intelligibility of the synthesized speech. The front-end of a Mandarin TTS system contains a series natural language processing (NLP) modules, including text normalization (TN) [6], Chinese word segmentation (CWS) [7], part-ofspeech (POS) tagging [8], polyphone disambiguation (PPD) [9] and prosodic structure (PS) prediction [10] etc.…”

Section: Introductionmentioning

confidence: 99%

“…NMT: the method takes the character representation extracted by pre-trained encoder of an NMT model as input. 6. TB: the method that uses feature ensemble by concatenating features of BERT and NMT.…”

mentioning

confidence: 99%

Pre-Trained Text Representations for Improving Front-End Text Processing in Mandarin Text-to-Speech Synthesis

Yang¹,

Zhong²,

Liu

2019

Interspeech 2019

View full text Add to dashboard Cite

In this paper, we propose a novel method to improve the performance and robustness of the front-end text processing modules of Mandarin text-to-speech (TTS) synthesis. We use pretrained text encoding models, such as the encoder of a transformer based NMT model and BERT, to extract the latent semantic representations of words or characters and use them as input features for tasks in the front-end of TTS systems. Our experiments on the tasks of Mandarin polyphone disambiguation and prosodic structure prediction show that the proposed method can significantly improve the performances. Specifically, we get an absolute improvement of 0.013 and 0.027 in F1 score for prosodic word prediction and prosodic phrase prediction respectively, and an absolute improvement of 2.44% in polyphone disambiguation compared to previous methods.

show abstract

The Kestrel TTS text normalization system

Cited by 63 publications

References 30 publications

Dual Encoder Classifier Models as Constraints in Neural Text Normalization

Dual Encoder Classifier Models as Constraints in Neural Text Normalization

An RNN Model of Text Normalization

Pre-Trained Text Representations for Improving Front-End Text Processing in Mandarin Text-to-Speech Synthesis

Contact Info

Product

Resources

About