2019
DOI: 10.1109/taslp.2019.2923951
|View full text |Cite
|
Sign up to set email alerts
|

Emotional Voice Conversion Using Dual Supervised Adversarial Networks With Continuous Wavelet Transform F0 Features

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
21
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
4
1

Relationship

0
10

Authors

Journals

citations
Cited by 26 publications
(21 citation statements)
references
References 31 publications
0
21
0
Order By: Relevance
“…In general, both categorical and dimensional representations of emotions have been widely used in both emotion recognition [85] and emotional voice conversion [86,45,87,83]. The study on representation learning [70] represents a new way of emotion representation, that further calls for large scale emotion-annotated speech data.…”
Section: Emotion Representationmentioning
confidence: 99%
“…In general, both categorical and dimensional representations of emotions have been widely used in both emotion recognition [85] and emotional voice conversion [86,45,87,83]. The study on representation learning [70] represents a new way of emotion representation, that further calls for large scale emotion-annotated speech data.…”
Section: Emotion Representationmentioning
confidence: 99%
“…Meanwhile, emotional voice conversion mainly has done with frame-based conversion [11,12] or rule-based approach [13]. These have limitations since DTW does not ensure the exact alignment and rule-based approach has a limitation to model voice conversion.…”
Section: Related Workmentioning
confidence: 99%
“…As prosody plays an important role in expressing emotional speech, several studies have focused on modelling spectral and fundamental frequency (F0) features with parallel data. Some previous works have explored prosody and spectral mapping separately using GMM [14]- [16], FNN [17], deep belief network (DBN) [18], and GAN [19] methods. Ming et al [20] converted the spectrum and F0 simultaneously with bidirectional long-short term memory (LSTM) using parallel data.…”
Section: Introductionmentioning
confidence: 99%