11th ISCA Speech Synthesis Workshop (SSW 11) 2021
DOI: 10.21437/ssw.2021-11
|View full text |Cite
|
Sign up to set email alerts
|

Improving Emotional TTS with an Emotion Intensity Input from Unsupervised Extraction

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
5
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 8 publications
(5 citation statements)
references
References 12 publications
0
5
0
Order By: Relevance
“…• Emovox w/ Attention Weights (proposed): where the attention weight vector obtained from a pre-trained SER is used to represent the intensity [18];…”
Section: Reference Methods and Setupsmentioning
confidence: 99%
See 2 more Smart Citations
“…• Emovox w/ Attention Weights (proposed): where the attention weight vector obtained from a pre-trained SER is used to represent the intensity [18];…”
Section: Reference Methods and Setupsmentioning
confidence: 99%
“…For example, in [19], an inter-to-intra distance ratio algorithm is applied to the learnt style tokens for emotional speech synthesis, where an interpolation technique is used to control emotion intensity. In [18], the authors show that a speech emotion recognizer is capable of generating a meaningful intensity representation via attention or saliency. In [77], [78], a relative attribute scheme is introduced to learn the emotion intensity for emotional speech synthesis.…”
Section: Expressive Speech Synthesis With Prosody Style Controlmentioning
confidence: 99%
See 1 more Smart Citation
“…more subjective and challenging to model. Some studies use auxiliary features such as a state of voiced, unvoiced and silence (VUS) [86], attention weights or a saliency map [87] to control the emotion intensity. Other studies manipulate the internal emotion representations through interpolation [88], scaling [76] or distance-based quantization [89].…”
Section: Controllable Emotional Speech Synthesismentioning
confidence: 99%
“…To the best of our knowledge, there is no existing deep learning-based SVS model that expresses emotions of varying intensities [9]. In the TTS field, many studies have been conducted to express types of emotions [10,11,12,13,14,15,16], but there are few studies to express the intensity of emotions [17,16].…”
Section: Introductionmentioning
confidence: 99%