Speech Prosody 2018 2018
DOI: 10.21437/speechprosody.2018-124
|View full text |Cite
|
Sign up to set email alerts
|

Paragraph Prosodic Patterns to Enhance Text-to-Speech Naturalness

Abstract: Speech synthesis has reached a reasonable high quality in recent years. However, there is still room for improvement in terms of naturalness and expressiveness when dealing with large multisentential discourse, since most text-to-speech synthesizers do not fully take into account the prosodic differences that have been observed in discourse units such as paragraphs. This work presents an implementation of paragraph-based prosodic patterns into the open-source MARYTTS platform, enriching its prosody output by m… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
8
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
5
1

Relationship

1
5

Authors

Journals

citations
Cited by 7 publications
(8 citation statements)
references
References 14 publications
(27 reference statements)
0
8
0
Order By: Relevance
“…More experiments need to be carried out to figure out the optimal criteria for the potential detection of paratone boundaries, whether based on raw Momel pitch extractions or symbolical INTSINT labels. We have not taken into account "intraparagraph features" as reported in [28] but we spotted potential candidates. Explicit enumeration discourse markers ("first", "second", "third") were not necessarily realised as autonomous initial paratone boundaries.…”
Section: Discussionmentioning
confidence: 99%
“…More experiments need to be carried out to figure out the optimal criteria for the potential detection of paratone boundaries, whether based on raw Momel pitch extractions or symbolical INTSINT labels. We have not taken into account "intraparagraph features" as reported in [28] but we spotted potential candidates. Explicit enumeration discourse markers ("first", "second", "third") were not necessarily realised as autonomous initial paratone boundaries.…”
Section: Discussionmentioning
confidence: 99%
“…Encoding discourse structure in TTS systems is still a relatively unexplored field. Recent work has focused on generic paragraph-based features [16,17]. In this work we propose an approach to encode DR information in neural statistical parametric speech synthesis (SPSS).…”
Section: Related Workmentioning
confidence: 99%
“…More recently, efforts have been made to use additional text embeddings derived from pre-trained LMs, such as the BERT model [25], to improve the modelling of prosody [26][27][28]. Moreover, prosody modelling with texts consist of multiple sentences have also been studied for SPSS [29][30][31][32][33][34]. Apart from sentence positions [31][32][33], discourse relations (DRs), which describe the logical relationship between two discourse units like sentences, are also used to improve the prosody generation [29,30,34].…”
Section: Introductionmentioning
confidence: 99%
“…Moreover, prosody modelling with texts consist of multiple sentences have also been studied for SPSS [29][30][31][32][33][34]. Apart from sentence positions [31][32][33], discourse relations (DRs), which describe the logical relationship between two discourse units like sentences, are also used to improve the prosody generation [29,30,34].…”
Section: Introductionmentioning
confidence: 99%