9th ISCA Workshop on Speech Synthesis Workshop (SSW 9) 2016
DOI: 10.21437/ssw.2016-26
|View full text |Cite
|
Sign up to set email alerts
|

Mandarin Prosodic Phrase Prediction based on Syntactic Trees

Abstract: Prosodic phrases (PPs) are important for Mandarin Text-To-Speech systems. Most of the existing PP detection methods need large manually annotated corpora to learn the models. In this paper, we propose a rule based method to predict the PP boundaries employing the syntactic information of a sentence. The method is based on the observation that a prosodic phrase is a meaningful segment of a sentence with length restrictions. A syntactic structure allows to segment a sentence according to grammars. We add some le… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
6
1
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 10 publications
(4 citation statements)
references
References 11 publications
0
4
0
Order By: Relevance
“…Text Normalization Rule-based [311], Neural-based [310,223,406,430], Hybrid [432] Word Segmentation [394,444,261] POS Tagging [292,323,221,444,135] Prosody Prediction [50,405,312,186,137,322,277,62,440,210,212,3] Grapheme to Phoneme N-gram [41,24], Neural-based [403,283,33, 320] --Polyphone Disambiguation [441,392,224,295,321,29,257] and then neural networks are leveraged to model text normalization as a sequence to sequence task where the source and target sequences are non-standard words and spoken-form words respectively [310,223,430]. Recently, some works [432] propose to combine the advantages of both rule-based and neural-based models to further improve the performance of text normalization.…”
Section: Task Research Workmentioning
confidence: 99%
“…Text Normalization Rule-based [311], Neural-based [310,223,406,430], Hybrid [432] Word Segmentation [394,444,261] POS Tagging [292,323,221,444,135] Prosody Prediction [50,405,312,186,137,322,277,62,440,210,212,3] Grapheme to Phoneme N-gram [41,24], Neural-based [403,283,33, 320] --Polyphone Disambiguation [441,392,224,295,321,29,257] and then neural networks are leveraged to model text normalization as a sequence to sequence task where the source and target sequences are non-standard words and spoken-form words respectively [310,223,430]. Recently, some works [432] propose to combine the advantages of both rule-based and neural-based models to further improve the performance of text normalization.…”
Section: Task Research Workmentioning
confidence: 99%
“…The text front-end structure of other languages is similar to that of Mandarin. These components are usually modeled by traditional statistical methods, such as syntactic trees [264] and CRF [167] based methods for PSP tasks and dictionary matching based methods [77] for pronunciation prediction tasks. However, these traditional text front-ends often fail to predict correctly in some unusual or complex contexts.…”
Section: Text Front-endmentioning
confidence: 99%
“…Conventionally, linguistic information including lexical features (e.g., part-of-speech tags) and syntax features (e.g., distance from punctuation) is used for this task. Machine learning methods are used in phrasing models, such as decision tree algorithms [2][3][4][5][6][7][8], hidden Markov models [9][10][11], and conditional random fields [3,12]. Due to the development of natural language processing (NLP) and deep learning technologies, word representations have become the key linguistic feature.…”
Section: Introductionmentioning
confidence: 99%