ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
DOI: 10.1109/icassp39728.2021.9414487
|View full text |Cite
|
Sign up to set email alerts
|

Contrastive Predictive Coding Supported Factorized Variational Autoencoder For Unsupervised Learning Of Disentangled Speech Representations

Abstract: In this work we address disentanglement of style and content in speech signals. We propose a fully convolutional variational autoencoder employing two encoders: a content encoder and a style encoder. To foster disentanglement, we propose adversarial contrastive predictive coding. This new disentanglement method does neither need parallel data nor any supervision. We show that the proposed technique is capable of separating speaker and content traits into the two different representations and show competitive s… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
8
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
7
3

Relationship

0
10

Authors

Journals

citations
Cited by 13 publications
(8 citation statements)
references
References 10 publications
0
8
0
Order By: Relevance
“…So, if the reconstruction is good enough, the procedure can detach the network from local minimums and may be done numerous times. The suggested method's most significant novelty is the simple confirmation of the results of assigning classes to an unknown collection of values using quantifiable criteria [ 64 , 76 ].…”
Section: Proposed S3idsmentioning
confidence: 99%
“…So, if the reconstruction is good enough, the procedure can detach the network from local minimums and may be done numerous times. The suggested method's most significant novelty is the simple confirmation of the results of assigning classes to an unknown collection of values using quantifiable criteria [ 64 , 76 ].…”
Section: Proposed S3idsmentioning
confidence: 99%
“…In wav2vec [3], the CPC [1] loss is used to pre-train speech representations for the purpose of speech recognition, and experiment results show self-supervised pre-training improves supervised speech recognition. Also, the CPC loss can be used to regularize adversarial training [2]. The CPC loss has also been extended and applied to bidirectional context networks [6].…”
Section: Related Workmentioning
confidence: 99%
“…One successful application of APC is as a pre-training means to learn to extract transferable speech representations [11,13]. [14] proposes another work by using contrastive predictive coding to support factorized disentangled representation learning for speech signal.…”
Section: Introductionmentioning
confidence: 99%