2020
DOI: 10.1109/taffc.2018.2828429
|View full text |Cite
|
Sign up to set email alerts
|

Can We Generate Emotional Pronunciations for Expressive Speech Synthesis?

Abstract: Abstract-In the field of expressive speech synthesis, a lot of work has been conducted on suprasegmental prosodic features while few has been done on pronunciation variants. However, prosody is highly related to the sequence of phonemes to be expressed. This article raises two issues in the generation of emotional pronunciations for TTS systems. The first issue consists in designing an automatic pronunciation generation method from text, while the second issue addresses the very existence of emotional pronunci… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0
1

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 8 publications
(8 citation statements)
references
References 38 publications
0
7
0
1
Order By: Relevance
“…Emotion expressive speech is even more complex, which has subtle dynamic variations associated with multiple prosodic attributes [74], [75], [76]. Inspired by the successful attempts in prosody style control, several studies control the emotion intensity for emotional speech synthesis.…”
Section: Expressive Speech Synthesis With Prosody Style Controlmentioning
confidence: 99%
“…Emotion expressive speech is even more complex, which has subtle dynamic variations associated with multiple prosodic attributes [74], [75], [76]. Inspired by the successful attempts in prosody style control, several studies control the emotion intensity for emotional speech synthesis.…”
Section: Expressive Speech Synthesis With Prosody Style Controlmentioning
confidence: 99%
“…Il faut cependant garder à l'esprit que l'alignement forcé ne peut produire que des étiquetages en phones qui sont prévus par le dictionnaire de prononciation. L'alignement forcé et le dictionnaire de prononciation peuvent être utilisés pour étudier différentes hypothèses linguistiques et pour analyser de grands corpus (Adda-Decker et al, 1999 ;Boula de Mareüil et Adda-Decker, 2002 ;Van Bael et al, 2007 ;Schuppler et al, 2014 ;Wu et al, 2017 ;Tahon et al, 2018). Avec cette méthode, l'absence ou la présence du segment en question est décidée automatiquement par l'alignement forcé.…”
Section: Alignement Forcéunclassified
“…Recent developments in TTS synthesis have improved the acoustic parameters that influence degree of perceived expressiveness of the voices, cf. [3]- [5]. Thus, one question is whether humans will display alignment to TTS productions that are realized with increased acoustic-phonetic expressiveness, since it conveys robust human-like dynamism in the voice.…”
Section: Human-computer Alignmentmentioning
confidence: 99%
“…Some efforts to improve the perceived dynamism of TTS have focused on synthesizing acoustic expressiveness, based on human expressive vocal patterns, cf. [3]- [5]. Yet, how human users perceive and respond to these displays of expressiveness is an empirical question: do users respond to vocal expressiveness in voice-AI speech differently than non-expressive productions?…”
Section: Introductionmentioning
confidence: 99%