DGC-Vector: A New Speaker Embedding for Zero-Shot Voice Conversion

Ruitong, Xiao,; Zhang, Haitong; Lin, Yue

doi:10.1109/icassp43922.2022.9746278

Cited by 5 publications

(2 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, GAN-based models are usually hard to train. Disentanglement-based approaches such as [12,13,14,15,16] aim to split the speech into spoken content and speaker characteristic (i.e. timbre).…”

Section: Introductionmentioning

confidence: 99%

Geography of Technology Transfer in China

Liu

2023

East China Normal University Scientific Reports

View full text Add to dashboard Cite

Graphs can model complex relationships between objects, enabling a myriad of Web applications such as online page/article classification and social recommendation. While graph neural networks (GNNs) have emerged as a powerful tool for graph representation learning, in an end-to-end supervised setting, their performance heavily relies on a large amount of task-specific supervision. To reduce labeling requirement, the "pre-train, fine-tune" and "pre-train, prompt" paradigms have become increasingly common. In particular, prompting is a popular alternative to fine-tuning in natural language processing, which is designed to narrow the gap between pre-training and downstream objectives in a task-specific manner. However, existing study of prompting on graphs is still limited, lacking a universal treatment to appeal to different downstream tasks. In this paper, we propose GraphPrompt, a novel pre-training and prompting framework on graphs. GraphPrompt not only unifies pre-training and downstream tasks into a common task template, but also employs a learnable prompt to assist a downstream task in locating the most relevant knowledge from the pre-trained model in a task-specific manner. Finally, we conduct extensive experiments on five public datasets to evaluate and analyze GraphPrompt. CCS CONCEPTS• Computing methodologies → Learning latent representations; • Information systems → Data mining.

show abstract

Section: Introductionmentioning

confidence: 99%

Geography of Technology Transfer in China

Liu

2023

East China Normal University Scientific Reports

View full text Add to dashboard Cite

show abstract

“…Speaker embedding methods, such as DGC-VECTOR [7], AutoVC [8], SEVC [9], YourTTS [10] IZSVC [11] and VoiceLoop [12], use a generation process conditioned on speaker embedding. During training, these embeddings are calculated for the training set.…”

Section: Introductionmentioning

confidence: 99%

Zero-Shot Voice Conditioning for Denoising Diffusion TTS Models

Levkovitch¹,

Nachmani²,

Wolf³

2022

Preprint

View full text Add to dashboard Cite

We present a novel way of conditioning a pretrained denoising diffusion speech model to produce speech in the voice of a novel person unseen during training. The method requires a short (∼ 3 seconds) sample from the target person, and generation is steered at inference time, without any training steps. At the heart of the method lies a sampling process that combines the estimation of the denoising model with a low-pass version of the new speaker's sample. The objective and subjective evaluations show that our sampling method can generate a voice similar to that of the target speaker in terms of frequency, with an accuracy comparable to state-of-the-art methods, and without training.

show abstract