2014
DOI: 10.1017/s1351324914000175
|View full text |Cite
|
Sign up to set email alerts
|

The Kestrel TTS text normalization system

Abstract: This paper describes the Kestrel text normalization system, a component of the Google textto-speech synthesis (TTS) system. At the core of Kestrel are text-normalization grammars that are compiled into libraries of weighted finite-state transducers (WFSTs). While the use of WFSTs for text normalization is itself not new, Kestrel differs from previous systems in its separation of the initial tokenization and classification phase of analysis from verbalization. Input text is first tokenized and different tokens … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
47
0
1

Year Published

2017
2017
2022
2022

Publication Types

Select...
5
3

Relationship

2
6

Authors

Journals

citations
Cited by 63 publications
(48 citation statements)
references
References 30 publications
0
47
0
1
Order By: Relevance
“…enough to suggest that such methods may prove useful when extended to all input classes. For our experimental data we used the training and test data for the 2017 Kaggle competition on text normalization [11], which consisted of ten million tokens of English Wikipedia text verbalized using the Kestrel text normalization system [2] for training data, and one million for testing. As described below, for some experiments we trained and tested on just measure and money expressions.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…enough to suggest that such methods may prove useful when extended to all input classes. For our experimental data we used the training and test data for the 2017 Kaggle competition on text normalization [11], which consisted of ten million tokens of English Wikipedia text verbalized using the Kestrel text normalization system [2] for training data, and one million for testing. As described below, for some experiments we trained and tested on just measure and money expressions.…”
Section: Methodsmentioning
confidence: 99%
“…For example, Train 540 leaves at 4:45 might be read as train five forty leaves at four forty five, in American English. One traditional approach to this problem involves hand-built finite-state grammars [2], but more recently there has been interest in neural approaches to the problem [3,4,5,6]. While neural methods perform well overall, they have a tendency on occasion to produce highly misleading output, such as reading 2mA as two million liters [3].…”
Section: Introductionmentioning
confidence: 99%
“…But these kinds of "silly" errors are errors that neural models trained on these sorts of data will make, as we demonstrate below. In contrast, a well-constructed hand-built system such as [3] might be brittle and fail for many classes of cases, but it would not produce errors of the kind we have just described. So one can have high overall accuracy, but a few bad errors of the kind just described make the system unusable.…”
mentioning
confidence: 94%
“…The front-end of TTS aims to extract various linguistic and phonetic features from the raw text, in order to improve the naturalness and intelligibility of the synthesized speech. The front-end of a Mandarin TTS system contains a series natural language processing (NLP) modules, including text normalization (TN) [6], Chinese word segmentation (CWS) [7], part-ofspeech (POS) tagging [8], polyphone disambiguation (PPD) [9] and prosodic structure (PS) prediction [10] etc.…”
Section: Introductionmentioning
confidence: 99%
“…NMT: the method takes the character representation extracted by pre-trained encoder of an NMT model as input. 6. TB: the method that uses feature ensemble by concatenating features of BERT and NMT.…”
mentioning
confidence: 99%