2021
DOI: 10.1109/access.2021.3065460
|View full text |Cite
|
Sign up to set email alerts
|

Sequence-to-Sequence Emotional Voice Conversion With Strength Control

Abstract: This paper proposes an improved emotional voice conversion (EVC) method with emotional strength and duration controllability. EVC methods without duration mapping generate emotional speech with identical duration to that of the neutral input speech. In reality, even the same sentences would have different speeds and rhythms depending on the emotions. To solve this, the proposed method adopts a sequence-to-sequence network with an attention module that enables the network to learn attention in the neutral input… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
32
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 23 publications
(32 citation statements)
references
References 54 publications
(59 reference statements)
0
32
0
Order By: Relevance
“…Studies have also revealed that the emotions can be expressed through universal principles that are shared across different individuals and cultures (Ekman, 1992;Manokara et al, 2021). This motivates the study of multispeaker (Shankar et al, 2019b(Shankar et al, , 2020, and speaker-independent emotional voice conversion (Zhou et al, 2020b;Choi and Hahn, 2021).…”
Section: Related Work Speech Emotion Conversionmentioning
confidence: 99%
“…Studies have also revealed that the emotions can be expressed through universal principles that are shared across different individuals and cultures (Ekman, 1992;Manokara et al, 2021). This motivates the study of multispeaker (Shankar et al, 2019b(Shankar et al, , 2020, and speaker-independent emotional voice conversion (Zhou et al, 2020b;Choi and Hahn, 2021).…”
Section: Related Work Speech Emotion Conversionmentioning
confidence: 99%
“…There are only few studies on sequence-to-sequence emotional voice conversion [20], [42], [43], [59]. In [42], the authors jointly model pitch and duration with parallel data, where the model is conditioned on the syllable position in the phrase.…”
Section: Sequence-to-sequence Emotional Voice Conversionmentioning
confidence: 99%
“…One uses auxiliary features such as a state of voiced, unvoiced, and silence (VUS) [17], attention weights or a saliency map [18]. Another manipulates the internal emotion representations through interpolation [19] or scaling [20]. Despite these methods, emotion intensity control is still an under-explored topic in emotional voice conversion.…”
Section: Introductionmentioning
confidence: 99%
“…Such framework generally works well in speaker-dependent tasks. Studies have also revealed that the emotions can be expressed through some universal principles that are shared across different individuals and cultures [55,56,57], that motivates the study of multispeaker [58,59,54], and speaker-independent emotional voice conversion [60,61].…”
Section: Introductionmentioning
confidence: 99%