Self-Supervised Speaker Verification Using Dynamic Loss-Gate and Label Correction

Han, Bing; Chen, Zhengyang; Ye, Qian

doi:10.21437/interspeech.2022-742

Cited by 16 publications

(3 citation statements)

References 28 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Due to its popularity, it also has been implemented in open-set speaker recognition. Han et al [ 27 ] introduced a DINO-based framework that required two-stage training. The first stage includes training the DINO framework.…”

Section: Related Workmentioning

confidence: 99%

Self-Supervised Open-Set Speaker Recognition with Laguerre–Voronoi Descriptors

Ohi,

Gavrilova

2024

Sensors

View full text Add to dashboard Cite

Speaker recognition is a challenging problem in behavioral biometrics that has been rigorously investigated over the last decade. Although numerous supervised closed-set systems inherit the power of deep neural networks, limited studies have been made on open-set speaker recognition. This paper proposes a self-supervised open-set speaker recognition that leverages the geometric properties of speaker distribution for accurate and robust speaker verification. The proposed framework consists of a deep neural network incorporating a wider viewpoint of temporal speech features and Laguerre–Voronoi diagram-based speech feature extraction. The deep neural network is trained with a specialized clustering criterion that only requires positive pairs during training. The experiments validated that the proposed system outperformed current state-of-the-art methods in open-set speaker recognition and cluster representation.

show abstract

Section: Related Workmentioning

confidence: 99%

Self-Supervised Open-Set Speaker Recognition with Laguerre–Voronoi Descriptors

Ohi,

Gavrilova

2024

Sensors

View full text Add to dashboard Cite

show abstract

“…It means that loss-gate can effectively select reliable labels which are of benefit to the model. However, we also try to set different thresholds (1,3,5), and find that the choice of threshold also has a non-negligible impact on model performance [29]. Based on the estimated GMM, our proposed dynamic loss-gate (DLG) can adjust the threshold dynamically considering the current training situation and obtains better performance than LG which only adopts a fixed threshold during the whole training process.…”

Section: B Evaluation Of Ca-dino With Pretrain-finetune Framework Wit...mentioning

confidence: 99%

“…S EAKER verification (SV) is a task that utilizes speech as the biometric feature to verify the speakers' identities. Recently, deep learning methods have been widely applied for speaker verification (SV) tasks and many efforts have been made such as various model architecture [2], [3], [4], [5], [6], training objection [7], [8], [9], pooling methods [10], [11] and Part of the results have been presented at Interspeech 2022 [1]. All the authors are with the X-Lance Lab, Department of Computer Science and Engineering & MoE Key Laboratory of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University, Shanghai, 200240 P. R. China (e-mail:{hanbing97, zhengyang.chen, yanminqian}@sjtu.edu.cn) so on, to achieve excellent performance compared with traditional methods such as Gaussian Mixture Model-Universal Background Model (GMM-UBM) [12], i-vector [13].…”

Section: Introductionmentioning

confidence: 99%

Self-Supervised Learning Based Domain Adaptation for Robust Speaker Verification

Chen

Wang

2021

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

Large performance degradation is often observed for speaker verification systems when applied to a new domain dataset. Given an unlabeled target-domain dataset, unsupervised domain adaptation (UDA) methods, which usually leverage adversarial training strategies, are commonly used to bridge the performance gap caused by the domain mismatch. However, such adversarial training strategy only uses the distribution information of target domain data and can not ensure the performance improvement on the target domain. In this paper, we incorporate self-supervised learning strategy to the unsupervised domain adaptation system and proposed a self-supervised learning based domain adaptation approach (SSDA). Compared to the traditional UDA method, the new SSDA training strategy can fully leverage the potential label information from target domain and adapt the speaker discrimination ability from source domain simultaneously. We evaluated the proposed approach on the Vox-Celeb (labeled source domain) and CnCeleb (unlabeled target domain) datasets, and the best SSDA system obtains 10.2% Equal Error Rate (EER) on the CnCeleb dataset without using any speaker labels on CnCeleb, which also can achieve the state-of-the-art results on this corpus.

show abstract

Self-supervised Speaker Verification Employing Augmentation Mix and Self-augmented Training-Based Clustering

Fathan,

Alam

2023

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Self-Supervised Speaker Verification Using Dynamic Loss-Gate and Label Correction

Cited by 16 publications

References 28 publications

Self-Supervised Open-Set Speaker Recognition with Laguerre–Voronoi Descriptors

Self-Supervised Open-Set Speaker Recognition with Laguerre–Voronoi Descriptors

Self-Supervised Learning Based Domain Adaptation for Robust Speaker Verification

Self-supervised Speaker Verification Employing Augmentation Mix and Self-augmented Training-Based Clustering

Contact Info

Product

Resources

About