Prosodic Boundaries Prediction in Russian Using Morphological and Syntactic Features

Menshikova, Alla; Kocharov, Daniil

doi:10.1007/978-3-030-34518-1_9

Cited by 1 publication

(3 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In TTS systems, phrase break prediction can be one of the most important front-end modules [23] along with text normalization [4] and G2P [5]. Although previous studies have indicated that the use of a deep learning-based model architecture [6][7][8][9] in phrase break prediction results in better performance than traditional methods such as a hidden Markov model (HMM) based model [2], most models are trained on a large-scale monolingual dataset. Futamata et al [7] employed contextual text representations obtained from the BERT model pre-trained on Japanese Wikipedia, along with various linguistic features such as part-Table 1: Distribution of the dataset.…”

Section: Phrase Break Predictionmentioning

confidence: 99%

“…The text processing front-end, such as text normalization, grapheme-to-phoneme (G2P), and phrase break prediction, has become a core part of modern text-to-speech (TTS) systems. Many studies have shown that these text processing modules successfully leverage the naturalness of synthetic speech in various approaches, including traditional statistical methods [1][2][3] and deep learning-based methods [4][5][6][7][8][9]. Recently, with the great success of BERT [10] in various natural language processing (NLP) tasks, most of the proposed works have adopted mainstream pre-trained language models (PLMs) [10][11][12][13].…”

Section: Introductionmentioning

confidence: 99%

“…Recently, with the great success of BERT [10] in various natural language processing (NLP) tasks, most of the proposed works have adopted mainstream pre-trained language models (PLMs) [10][11][12][13]. These works have reported stable and robust performance on several downstream tasks of the TTS front-end, such as text normalization [4], G2P conversion [5], and phrase break prediction [6][7][8][9]. While PLMs have enabled significant advances in a wide range of TTS front-end tasks, a sizeable set of labeled, task-specific data is essentially required.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations