Conversational Speech Synthesis System with Communication Situation Dependent HMMs

Iwata, Kenji; Kobayashi, Tetsuo

doi:10.1007/978-1-4614-1335-6_13

Cited by 5 publications

(5 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For the data size issue, conventional studies on dialoguestyle TTS had difficulty collecting large-scale, high-quality speech corpora. For example, the training data-set sizes were 25 min in [9] and 558 sentences in [7]; however, our maximum training data size was 433 min (14179 sentences). We will discuss the data size again in Section 6.…”

Section: Non-monologue Speech Synthesismentioning

confidence: 99%

“…In contrast, most previous studies on dialogue-oriented TTS used very small data-sets. Specifically, the sizes were 25 min, [9] 558 sentences, [7] and 1200 sentences. [21] Generally speaking, it is difficult to build a dialogue-oriented TTS system with such a small data-set.…”

Section: Data Sizementioning

confidence: 99%

“…In the spoken dialogue systems (SDS) community, some recent studies have built expressive TTS systems for SDSs. In [7], HMMs were separately trained with speech data in different emotional tones such as liveliness, sulking, anxiousness and relief. In the above studies, however, the recording was conducted in a monologue style.…”

Section: Related Workmentioning

confidence: 99%

“…These numbers were selected based on the standard in speech synthesis studies. For example, 20 sentences were evaluated by six subjects in [9], and 25 sentences were evaluated by 16 subjects in [7].…”

Section: Data Sizementioning

confidence: 99%

See 3 more Smart Citations

A cloud robotics approach towards dialogue-oriented robot speech

et al. 2015

View full text Add to dashboard Cite

Robot utterances generally sound monotonous, unnatural and unfriendly because their Text-to-Speech systems are not optimized for communication but for text reading. Here, we present a non-monologue speech synthesis for robots. The key novelty lies in speech synthesis based on Hidden Markov models (HMMs) using a non-monologue corpus: we collected a speech corpus in a non-monologue style in which two professional voice talents read scripted dialogues, and HMMs were then trained with the corpus and used for speech synthesis. We conducted experiments in which the proposed method was evaluated by 24 subjects in three scenarios: text reading, dialogue and domestic service robot (DSR) scenarios. In the DSR scenario, we used a physical robot and compared our proposed method with a baseline method using the standard Mean Opinion Score criterion. Our experimental results showed that our proposed method's performance was (1) at the same level as the baseline method in the text-reading scenario and (2) exceeded it in the DSR scenario. We deployed our proposed system as a cloud-based speech synthesis service so that it can be used without any cost. 1

show abstract

Section: Non-monologue Speech Synthesismentioning

confidence: 99%

Section: Data Sizementioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Data Sizementioning

confidence: 99%

See 2 more Smart Citations

A cloud robotics approach towards dialogue-oriented robot speech

et al. 2015

View full text Add to dashboard Cite

show abstract

“…Emotional speech synthesis is a technique for diversifying the expression of speech synthesis (Qin et al, 2006;Schröder, 2001;Yang et al, 2018). It specifies both emotional parameters and the text input so that the speech reflects the designated emotion (Charfuelan and Steiner, 2013;Inoue et al, 2017;Iwata and Kobayashi, 2011;Nose and Kobayashi, 2013).…”

Section: Introductionmentioning

confidence: 99%

Sentiment Analysis for Emotional Speech Synthesis in a News Dialogue System

Takatsu¹,

Ando²,

Matsuyama³

et al. 2020

Proceedings of the 28th International Conference on Computational Linguistics

Self Cite

View full text Add to dashboard Cite

As smart speakers and conversational robots become ubiquitous, the demand for expressive speech synthesis has increased. In this paper, to control the emotional parameters of the speech synthesis according to certain dialogue contents, we construct a news dataset with emotion labels ("positive," "negative," or "neutral") annotated for each sentence. We then propose a method to identify emotion labels using a model combining BERT and BiLSTM-CRF, and evaluate its effectiveness using the constructed dataset. The results showed that the classification model performance can be efficiently improved by preferentially annotating news articles with low confidence in the human-in-the-loop machine learning framework.

show abstract

Speaker's intentions conveyed to listeners by sentence-final particles and their intonations in Japanese conversational speech

Iwata

Kobayashi

2013

2013 IEEE International Conference on Acoustics, Speech and Signal Processing

View full text Add to dashboard Cite

Conversational Speech Synthesis System with Communication Situation Dependent HMMs

Cited by 5 publications

References 15 publications

A cloud robotics approach towards dialogue-oriented robot speech

A cloud robotics approach towards dialogue-oriented robot speech

Sentiment Analysis for Emotional Speech Synthesis in a News Dialogue System

Speaker's intentions conveyed to listeners by sentence-final particles and their intonations in Japanese conversational speech

Contact Info

Product

Resources

About