2018 26th European Signal Processing Conference (EUSIPCO) 2018
DOI: 10.23919/eusipco.2018.8553236
|View full text |Cite
|
Sign up to set email alerts
|

CycleGAN-VC: Non-parallel Voice Conversion Using Cycle-Consistent Adversarial Networks

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
235
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 298 publications
(239 citation statements)
references
References 41 publications
0
235
0
Order By: Relevance
“…A deep bidirectional long short-term memory based recurrent neural networks was proposed by [21] to improve the naturalness of voice conversion models. To overcome the need for parallel data in building conversion function for VC models [22] proposed a novel architecture that uses a cycle-consistent adversarial network.…”
Section: Vowels and Prosody Contribution In Neural Network Based Voicmentioning
confidence: 99%
“…A deep bidirectional long short-term memory based recurrent neural networks was proposed by [21] to improve the naturalness of voice conversion models. To overcome the need for parallel data in building conversion function for VC models [22] proposed a novel architecture that uses a cycle-consistent adversarial network.…”
Section: Vowels and Prosody Contribution In Neural Network Based Voicmentioning
confidence: 99%
“…While these systems offer the advantage of being able to generate novel TTS voice samples given a few seconds of reference audio, the quality of TTS is inferior [25] compared to single-speaker TTS models. In our system, we employ another recent work [14] that uses a CycleGAN architecture to achieve good voice transfer between two human speakers with no loss in linguistic features. We train this model to perform a cross-language transfer of a synthetic TTS voice to a natural target speaker voice.…”
Section: Voice Transfer In Audiomentioning
confidence: 99%
“…As our TTS model only generates audio samples in a single voice, we personalize this voice to match the voice of different target speakers. As collecting parallel training data for the same speaker across languages is infeasible, we adopt the CycleGAN architecture [14] to work around this problem.…”
Section: Personalizing Speaker Voicementioning
confidence: 99%
See 2 more Smart Citations