10th ISCA Workshop on Speech Synthesis (SSW 10) 2019
DOI: 10.21437/ssw.2019-19
|View full text |Cite
|
Sign up to set email alerts
|

Speech Synthesis Evaluation — State-of-the-Art Assessment and Suggestion for a Novel Research Program

Abstract: Speech synthesis applications have become an ubiquity, in navigation systems, digital assistants or as screen or audio book readers. Despite their impact on the acceptability of the systems in which they are embedded, and despite the fact that different applications probably need different types of TTS voices, TTS evaluation is still largely treated as an isolated problem. Even though there is strong agreement among researchers that the mainstream approaches to Text-to-Speech (TTS) evaluation are often insuffi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
43
1

Year Published

2020
2020
2023
2023

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 55 publications
(44 citation statements)
references
References 27 publications
0
43
1
Order By: Relevance
“…Even though recent innovations have been leading synthesis systems into producing human-like speech, the flexibility required to render natural human-like speech remains a difficult problem. Additionally, speech synthesis cannot be taken as a general problem with one solution fitting everyone [34]. This is the reason why synthesising expressive speech, as well as adapting the target speech to a given speaker, are the current hot challenges of the community.…”
Section: Discussionmentioning
confidence: 99%
“…Even though recent innovations have been leading synthesis systems into producing human-like speech, the flexibility required to render natural human-like speech remains a difficult problem. Additionally, speech synthesis cannot be taken as a general problem with one solution fitting everyone [34]. This is the reason why synthesising expressive speech, as well as adapting the target speech to a given speaker, are the current hot challenges of the community.…”
Section: Discussionmentioning
confidence: 99%
“…Cambre and Kulkarnia [2019] highlight the social implications of designing voices for smart devices and provide a research framework for designers to utilise to help shape user's experiences. Finally, Wagner et al [2019] discuss the future of evaluating speech synthesis, suggesting a move towards HCI-focused approaches of evaluating speech in appropriate contexts with users. Here, we build upon this existing work and present three challenges for those working in different areas of expressive synthesis.…”
Section: Current Challenges and Future Directions In Expressive Synthesismentioning
confidence: 99%
“…Evaluating speech is currently done using three key approaches [Wagner et al 2019]: objective assessments classifying systems with particular scores or contrasting them with other speech (e.g. through mel-cepstral distortion (MCD) ratings); subjective assessments rating speech on concepts such as intelligibility and naturalness: and behavioural assessments examining user actions like task completion time or physiological arousal.…”
Section: More and Better User Evaluation Neededmentioning
confidence: 99%
“…Hinterleitner (2017) recently specified five dimensions of quality for TTS systems: naturalness of voice, prosodic quality, fluency and intelligibility, absence of disturbances, and calmness. While most synthesized voices have reached a high level of intelligibility, it is the perceived quality that still requires clarification (Polkosky and Lewis, 2003;Wagner et al, 2019). Moreover, the quality of the voice was shown to be important in establishing a positive human-robot relationship even when the content of the utterances was unintelligible (McGinn and Torre, 2019).…”
Section: Auditory Perception Of Humanoidsmentioning
confidence: 99%