Speech Prosody 2016 2016
DOI: 10.21437/speechprosody.2016-235
|View full text |Cite
|
Sign up to set email alerts
|

Paragraph-based prosodic cues for speech synthesis applications

Abstract: Speech synthesis has improved in both expressiveness and voice quality in recent years. However, obtaining full expressiveness when dealing with large multi-sentential synthesized discourse is still a challenge, since speech synthesizers do not take into account the prosodic differences that have been observed in discourse units such as paragraphs. The current study validates and extends previous work by analyzing the prosody of paragraph units in a large and diverse corpus of TED Talks using automatically ext… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
30
0

Year Published

2016
2016
2023
2023

Publication Types

Select...
3
2
1

Relationship

3
3

Authors

Journals

citations
Cited by 15 publications
(30 citation statements)
references
References 30 publications
(42 reference statements)
0
30
0
Order By: Relevance
“…The corpus consists of 1046 talks by 884 English speakers, uttering a total amount of 156034 sentences. The corresponding transcripts, as well as audio and video files, are available on TED's website; they were created by volunteers and include punctuation and paragraph breaks [12]. The subtitle timings of TED transcripts do not always correspond to sentences in the transcript.…”
Section: Datamentioning
confidence: 99%
See 2 more Smart Citations
“…The corpus consists of 1046 talks by 884 English speakers, uttering a total amount of 156034 sentences. The corresponding transcripts, as well as audio and video files, are available on TED's website; they were created by volunteers and include punctuation and paragraph breaks [12]. The subtitle timings of TED transcripts do not always correspond to sentences in the transcript.…”
Section: Datamentioning
confidence: 99%
“…To overcome this limitation, precise word timings were first obtained through Viterbi forced alignment using an automatic speech recognition system. The word timings were then further used to automatically obtain sentence boundaries and thus sentence timings [12].…”
Section: Datamentioning
confidence: 99%
See 1 more Smart Citation
“…Based on previous analysis of paragraph prosody [18], we calculated aggregate statistics for each sentence: mean, standard deviation, maximum, minimum, median, slope, range (99th-1st quantiles). We also record the values for the previous and next sentences, as well as their differences to the target, and the difference between the first and last word of the target.…”
Section: Prosodic Featuresmentioning
confidence: 99%
“…Similarly, prosodic features based on pitch, energy and timing have been used to perform topic segmentation on their own [13,14,15] or in conjunction with lexical features [8,12,16,17]. While pause duration appears to be the most robust segmentation cue, paragraphs also seem to follow general prosodic declination and reset patterns [18]. So, we expect prosody to be informative of paragraph breaks.…”
Section: Introductionmentioning
confidence: 99%