ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
DOI: 10.1109/icassp43922.2022.9746369
|View full text |Cite
|
Sign up to set email alerts
|

Avqvc: One-Shot Voice Conversion By Vector Quantization With Applying Contrastive Learning

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
7
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
2
2

Relationship

2
6

Authors

Journals

citations
Cited by 36 publications
(7 citation statements)
references
References 16 publications
0
7
0
Order By: Relevance
“…It makes the model more robust to interference such as noise and reverberation. Not only the speaker identification but also some other voice domain tasks, text-to-speech, voice conversion, etc., embedded X-vector for performance gains [33]- [36].…”
Section: B Timbral Featuresmentioning
confidence: 99%
“…It makes the model more robust to interference such as noise and reverberation. Not only the speaker identification but also some other voice domain tasks, text-to-speech, voice conversion, etc., embedded X-vector for performance gains [33]- [36].…”
Section: B Timbral Featuresmentioning
confidence: 99%
“…Fixed-length speaker representation is obtained from pre-trained SV models [9]- [12], [15], or extracted from the target speaker's speech through disentanglement techniques such as instance normalization [13], [14], adversarial training [15], information bottleneck [9], [19], [20], or perturbation [10], [18]. To integrate the speaker representation with other speech components for speech generation, addition [16], concatenation [9], [19], or adaptive instance normalization [13], [14] are commonly used.…”
Section: A Typical Zero-shot Vc Frameworkmentioning
confidence: 99%
“…Thus, the identically distributed assumption is violated, halting the forthright application of the model trained in the su-pervised manner on the target unlabeled dataset. In view of that, the contrastive learning paradigm has achieved impressive successes in a spectrum of practical fields, such as computer vision (Wu et al 2018;Chen et al 2020;Qiang et al 2022), natural language processing (Gao, Yao, and Chen 2021;Qu et al 2021), signal processing (Yao et al 2022;Tang et al 2022), etc. As tractable surrogates, the contrastive approaches further establish promising capacity in the field of unsupervised graph learning, namely GCL methods (You et al 2020).…”
Section: Introductionmentioning
confidence: 99%
“…Another important issue with GCL is negative sampling. BGRL (Thakoor et al 2022) proposes a method that uses simple graph augmentation and does not have to construct negative samples while effectively scaling to large-scale graphs. To address the issue of sampling bias in GCL, PGCL (Lin et al 2021) proposes a negative sampling method based on semantic clustering.…”
Section: Introductionmentioning
confidence: 99%