Interspeech 2022 2022
DOI: 10.21437/interspeech.2022-802
|View full text |Cite
|
Sign up to set email alerts
|

Label-Efficient Self-Supervised Speaker Verification With Information Maximization and Contrastive Learning

Abstract: Most state-of-the-art self-supervised speaker verification systems rely on a contrastive-based objective function to learn speaker representations from unlabeled speech data. We explore different ways to improve the performance of these methods by: (1) revisiting how positive and negative pairs are sampled through a "symmetric" formulation of the contrastive loss; (2) introducing margins similar to AM-Softmax and AAM-Softmax that have been widely adopted in the supervised setting. We demonstrate the effectiven… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(1 citation statement)
references
References 19 publications
0
1
0
Order By: Relevance
“…Among them, W-MSE [14], Barlow-Twinss [15], and VICReg [10] attempt to produce embedding variables that are decorrelated from each other, whereas CorInfoMax [13] does not constrain the variables to be uncorrelated but instead avoids covariance matrix degeneracy by using log-determinant as a regularizer loss function. However, recent investigations show that these regularization terms worked effectively only if given specific SSL structural settings [10] and strong data augmentation [16]. Note that all these regularization methods [10,13,14,15] adopt an SSL-no-SG structure, where "no-SG" means the branch networks are both learnable with no stop-gradient.…”
Section: Related Workmentioning
confidence: 99%
“…Among them, W-MSE [14], Barlow-Twinss [15], and VICReg [10] attempt to produce embedding variables that are decorrelated from each other, whereas CorInfoMax [13] does not constrain the variables to be uncorrelated but instead avoids covariance matrix degeneracy by using log-determinant as a regularizer loss function. However, recent investigations show that these regularization terms worked effectively only if given specific SSL structural settings [10] and strong data augmentation [16]. Note that all these regularization methods [10,13,14,15] adopt an SSL-no-SG structure, where "no-SG" means the branch networks are both learnable with no stop-gradient.…”
Section: Related Workmentioning
confidence: 99%