Proceedings of the Paralinguistic Information and Its Integration in Spoken Dialogue Systems Workshop 2011
DOI: 10.1007/978-1-4614-1335-6_13
|View full text |Cite
|
Sign up to set email alerts
|

Conversational Speech Synthesis System with Communication Situation Dependent HMMs

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2013
2013
2020
2020

Publication Types

Select...
3
2

Relationship

1
4

Authors

Journals

citations
Cited by 5 publications
(5 citation statements)
references
References 15 publications
0
5
0
Order By: Relevance
“…For the data size issue, conventional studies on dialoguestyle TTS had difficulty collecting large-scale, high-quality speech corpora. For example, the training data-set sizes were 25 min in [9] and 558 sentences in [7]; however, our maximum training data size was 433 min (14179 sentences). We will discuss the data size again in Section 6.…”
Section: Non-monologue Speech Synthesismentioning
confidence: 99%
See 3 more Smart Citations
“…For the data size issue, conventional studies on dialoguestyle TTS had difficulty collecting large-scale, high-quality speech corpora. For example, the training data-set sizes were 25 min in [9] and 558 sentences in [7]; however, our maximum training data size was 433 min (14179 sentences). We will discuss the data size again in Section 6.…”
Section: Non-monologue Speech Synthesismentioning
confidence: 99%
“…In contrast, most previous studies on dialogue-oriented TTS used very small data-sets. Specifically, the sizes were 25 min, [9] 558 sentences, [7] and 1200 sentences. [21] Generally speaking, it is difficult to build a dialogue-oriented TTS system with such a small data-set.…”
Section: Data Sizementioning
confidence: 99%
See 2 more Smart Citations
“…Emotional speech synthesis is a technique for diversifying the expression of speech synthesis (Qin et al, 2006;Schröder, 2001;Yang et al, 2018). It specifies both emotional parameters and the text input so that the speech reflects the designated emotion (Charfuelan and Steiner, 2013;Inoue et al, 2017;Iwata and Kobayashi, 2011;Nose and Kobayashi, 2013).…”
Section: Introductionmentioning
confidence: 99%