2022
DOI: 10.1109/jstsp.2022.3197315
|View full text |Cite
|
Sign up to set email alerts
|

Non-Contrastive Self-Supervised Learning for Utterance-Level Information Extraction From Speech

Abstract: In recent studies, self-supervised pre-trained models tend to outperform supervised pre-trained models in transfer learning. In particular, self-supervised learning of utterance-level speech representation can be used in speech applications that require discriminative representation of consistent attributes within an utterance: speaker, language, emotion, and age. Existing frame-level self-supervised speech representation, e.g., wav2vec, can be used as utterance-level representation with pooling, but the model… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 9 publications
(5 citation statements)
references
References 36 publications
0
4
0
Order By: Relevance
“…II. For speaker verification, MCL-DPP achieves an EER of 2.89%, 3.34% and 6.47% on Vox-O, Vox-E and Vox-H, respectively, that outperform the best prior work, i.e., Cho et al [40] by 40.17% on Vox-O. For face verification, it also achieves an EER of 1.74% in Vox-O.…”
Section: Results and Analysismentioning
confidence: 88%
See 2 more Smart Citations
“…II. For speaker verification, MCL-DPP achieves an EER of 2.89%, 3.34% and 6.47% on Vox-O, Vox-E and Vox-H, respectively, that outperform the best prior work, i.e., Cho et al [40] by 40.17% on Vox-O. For face verification, it also achieves an EER of 1.74% in Vox-O.…”
Section: Results and Analysismentioning
confidence: 88%
“…Other comparison-based self-supervised learning techniques include the MOCO framework [38], [39], which stores the negative pairs in the memory bank; the DINO framework [12], [40]- [42] that only involves positive pairs and achieves considerable improvement. For efficiency and effectiveness, we adopt the SCL framework in this study and focus on the sampling strategy of positive pairs.…”
Section: B Self-supervised Learning Of Speaker Encodermentioning
confidence: 99%
See 1 more Smart Citation
“…These defenses can fall under the following categories: detect and remove the attack vector; [28][29][30] implement non-differentiable functions to obscure gradients; 31,32 sanitize the attack vector to eliminate adversarial perturbations; [33][34][35] and apply formal verification [36][37][38] or certification techniques [39][40][41] to provide performance guarantees. Defenses applied to training data protect against poisoning attacks by filtering out potentially poisoned data samples [42][43][44][45][46] . Defenses within the training algorithm employ robust training techniques, such as adversarial Table 2.…”
Section: Defense Preparationmentioning
confidence: 99%
“…Recently, motivated by the surge of self-supervised learning concepts, many deep embedding methods [7,8,9,10,11] The code associated with this article is publicly available at https://github.com/theolepage/sslsv. have proven to be very effective in benefiting from the massive amount of unlabeled data.…”
Section: Introductionmentioning
confidence: 99%