Interspeech 2020 2020
DOI: 10.21437/interspeech.2020-1033
|View full text |Cite
|
Sign up to set email alerts
|

Improving the Speaker Identity of Non-Parallel Many-to-Many Voice Conversion with Adversarial Speaker Recognition

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
4
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
2
2

Relationship

0
7

Authors

Journals

citations
Cited by 12 publications
(4 citation statements)
references
References 19 publications
0
4
0
Order By: Relevance
“…Pretraining a speaker recognition system offers the advantage of using large-scale speaker databases, enabling the learned speaker representation to exhibit high speaker similarity in several multi-speaker speech generation frameworks [19], [43]- [45]. On the other hand, joint training provides a more flexible optimization process dedicated to the speech synthesis task, providing further insights to characterize the speaker details [46], [47].…”
Section: Neural Speaker Encodingmentioning
confidence: 99%
“…Pretraining a speaker recognition system offers the advantage of using large-scale speaker databases, enabling the learned speaker representation to exhibit high speaker similarity in several multi-speaker speech generation frameworks [19], [43]- [45]. On the other hand, joint training provides a more flexible optimization process dedicated to the speech synthesis task, providing further insights to characterize the speaker details [46], [47].…”
Section: Neural Speaker Encodingmentioning
confidence: 99%
“…Both GAN and Flow-based models have in common is that they all bypass the problem of feature decoupling and convert speech directly, while there are also some other works [9,10,11,12,13,14] attempting to disentangle the styling unit and content unit in the embedding space. The purpose is obvious, with content information and timbre information are obtained respectively, it is easy for us to fix the content embedding while replacing the style embedding to convert the voice.…”
Section: Introductionmentioning
confidence: 99%
“…One type of methods is based on the automatic speech recognition (ASR) model [9,10,11,15]. Firstly, a pretrained speaker-independent ASR model was employed to extract linguistic-related features (e.g.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation