Interspeech 2022 2022
DOI: 10.21437/interspeech.2022-742
|View full text |Cite
|
Sign up to set email alerts
|

Self-Supervised Speaker Verification Using Dynamic Loss-Gate and Label Correction

Abstract: For self-supervised speaker verification, the quality of pseudo labels decides the upper bound of the system due to the massive unreliable labels. In this work, we propose dynamic loss-gate and label correction (DLG-LC) to alleviate the performance degradation caused by unreliable estimated labels. In DLG, we adopt Gaussian Mixture Model (GMM) to dynamically model the loss distribution and use the estimated GMM to distinguish the reliable and unreliable labels automatically. Besides, to better utilize the unre… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 16 publications
(3 citation statements)
references
References 28 publications
0
3
0
Order By: Relevance
“…Due to its popularity, it also has been implemented in open-set speaker recognition. Han et al [ 27 ] introduced a DINO-based framework that required two-stage training. The first stage includes training the DINO framework.…”
Section: Related Workmentioning
confidence: 99%
“…Due to its popularity, it also has been implemented in open-set speaker recognition. Han et al [ 27 ] introduced a DINO-based framework that required two-stage training. The first stage includes training the DINO framework.…”
Section: Related Workmentioning
confidence: 99%
“…It means that loss-gate can effectively select reliable labels which are of benefit to the model. However, we also try to set different thresholds (1,3,5), and find that the choice of threshold also has a non-negligible impact on model performance [29]. Based on the estimated GMM, our proposed dynamic loss-gate (DLG) can adjust the threshold dynamically considering the current training situation and obtains better performance than LG which only adopts a fixed threshold during the whole training process.…”
Section: B Evaluation Of Ca-dino With Pretrain-finetune Framework Wit...mentioning
confidence: 99%
“…S EAKER verification (SV) is a task that utilizes speech as the biometric feature to verify the speakers' identities. Recently, deep learning methods have been widely applied for speaker verification (SV) tasks and many efforts have been made such as various model architecture [2], [3], [4], [5], [6], training objection [7], [8], [9], pooling methods [10], [11] and Part of the results have been presented at Interspeech 2022 [1]. All the authors are with the X-Lance Lab, Department of Computer Science and Engineering & MoE Key Laboratory of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University, Shanghai, 200240 P. R. China (e-mail:{hanbing97, zhengyang.chen, yanminqian}@sjtu.edu.cn) so on, to achieve excellent performance compared with traditional methods such as Gaussian Mixture Model-Universal Background Model (GMM-UBM) [12], i-vector [13].…”
Section: Introductionmentioning
confidence: 99%