ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019
DOI: 10.1109/icassp.2019.8683204
|View full text |Cite
|
Sign up to set email alerts
|

Adversarially Trained Autoencoders for Parallel-data-free Voice Conversion

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 7 publications
(5 citation statements)
references
References 8 publications
0
5
0
Order By: Relevance
“…Saito et al [33] proposed to use PPGs for improving VAE-based VC. Several studies proposed AE-based VC with adversarial learning of hidden representations against speakers information [36], [39], [40]. Polyak et al [39] tried to incorporate an attention module between the encoder and the decoder in a WaveNet-based AE.…”
Section: B Auto-encoder Based Voice Conversionmentioning
confidence: 99%
“…Saito et al [33] proposed to use PPGs for improving VAE-based VC. Several studies proposed AE-based VC with adversarial learning of hidden representations against speakers information [36], [39], [40]. Polyak et al [39] tried to incorporate an attention module between the encoder and the decoder in a WaveNet-based AE.…”
Section: B Auto-encoder Based Voice Conversionmentioning
confidence: 99%
“…To strengthen the adversarial training, a secondary speaker classifier C s is also applied to the outputs of the first LSTM layer in R. And it's also trained with a classification loss L s and passes an adversarial loss L adv . As indicated by Ocal et al [21], the error rate of the optimal speaker classifier relates to an upper bound of mutual information I(y; H). In order to approximate the optimal classifier, the speaker classifiers are updated K times for each training step in our experiments.…”
Section: Recognition Processmentioning
confidence: 97%
“…Our method is similar to the auto-encoder (AE) based VC with speaker adversarial learning [19][20][21][22]. Polyak et al [19] proposed a WaveNet based AE model for VC with a speaker confusion network.…”
Section: Related Workmentioning
confidence: 99%
“…Vector quantization based methods [14] are further proposed to model content information as discrete distributions which are more related to the distribution of phonetic information. An auxiliary adversarial speaker classifier is adopted [15] to encourage the encoder to cast away speaker information from content information by minimizing the mutual information between their representations [16].…”
Section: Introductionmentioning
confidence: 99%