A Comparative Study of Self-Supervised Speech Representation Based Voice Conversion

Huang, Wen-Chin; Yang, Shu-Wen; Hayashi, Tomoki; Toda, Tomoki

doi:10.1109/jstsp.2022.3193761

Cited by 10 publications

(2 citation statements)

References 68 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…As a follow up to our prior effort, as presented in [17], this work proposes a novel strategy for anonymization via voice conversion, which, instead of manipulating the xvectors, leverages the approach of ContentVec [36] to obtain speaker-independent speech representations and starts from pre-trained models within the S3PRL toolkit [37]. The proposed strategy is evaluated on a public dataset and compared against a variety of neural and signal-processing-based voice conversion methods.…”

Section: Introductionmentioning

confidence: 99%

“…Specifically, it allows us to evaluate the generative capabilities of pre-trained models, as well as the generalizability of the resulting conversion model. The resulting anonymization task was mainly derived from the setup proposed in [37] for voice conversion. The speech embeddings were computed using the introduced disentanglement mechanism on the WavLM features in the present work.…”

mentioning

confidence: 99%

See 1 more Smart Citation

Speaker Anonymization: Disentangling Speaker Features from Pre-Trained Speech Embeddings for Voice Conversion

Matassoni,

Fong,

Brutti

2024

Applied Sciences

View full text Add to dashboard Cite

Speech is a crucial source of personal information, and the risk of attackers using such information increases day by day. Speaker privacy protection is crucial, and various approaches have been proposed to hide the speaker’s identity. One approach is voice anonymization, which aims to safeguard speaker identity while maintaining speech content through techniques such as voice conversion or spectral feature alteration. The significance of voice anonymization has grown due to the necessity to protect personal information in applications such as voice assistants, authentication, and customer support. Building upon the S3PRL-VC toolkit and on pre-trained speech and speaker representation models, this paper introduces a feature disentanglement approach to improve the de-identification performance of the state-of-the-art anonymization approaches based on voice conversion. The proposed approach achieves state-of-the-art speaker de-identification and causes minimal impact on the intelligibility of the signal after conversion.

show abstract

Section: Introductionmentioning

confidence: 99%

mentioning

confidence: 99%