ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
DOI: 10.1109/icassp39728.2021.9414321
|View full text |Cite
|
Sign up to set email alerts
|

Similarity Analysis of Self-Supervised Speech Representations

Abstract: Self-supervised speech representation learning has recently been a prosperous research topic. Many algorithms have been proposed for learning useful representations from large-scale unlabeled data, and their applications to a wide range of speech tasks have also been investigated. However, there has been little research focusing on understanding the properties of existing approaches. In this work, we aim to provide a comparative study of some of the most representative self-supervised algorithms. Specifically,… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

2
10
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
6
2
1

Relationship

0
9

Authors

Journals

citations
Cited by 26 publications
(13 citation statements)
references
References 25 publications
(35 reference statements)
2
10
0
Order By: Relevance
“…They observed that standard design recipes do not translate directly from end-to-end training to selfsupervision with the number of filters being a significant factor. Finally, some approaches for post-hoc analysis of the learned representations include studying the similarity of the learned representations to the gold standard that is supervised learning [10,20,25], and understanding the intrinsic dimensionality of the representations [23,71].…”
Section: Analysis Methods For Understanding Self-supervised Approachesmentioning
confidence: 99%
See 1 more Smart Citation
“…They observed that standard design recipes do not translate directly from end-to-end training to selfsupervision with the number of filters being a significant factor. Finally, some approaches for post-hoc analysis of the learned representations include studying the similarity of the learned representations to the gold standard that is supervised learning [10,20,25], and understanding the intrinsic dimensionality of the representations [23,71].…”
Section: Analysis Methods For Understanding Self-supervised Approachesmentioning
confidence: 99%
“…( 9) Masked reconstruction has the highest implicit dimensionality and thus more efficiently makes use of the learned representation space. (10) Utilizing the means and variances of the source dataset normalization on the target dataset results in considerable performance gains.…”
Section: Lessons Learned and Insights Gainedmentioning
confidence: 99%
“…A growing number of self-supervised speech models have been proposed. Examples include contrastive predictive coding (CPC) [16,29], auto-regressive predictive coding [30], wav2vec [31], HuBERT [32,33], wav2vec 2.0 [12,34] and Wavlm [35], with all showing promising results for a variety of different speech processing tasks. Two particularly popular approaches, HuBERT and wav2vec 2.0, have been applied to automatic speech recognition [12,13], mispronunciation detection [36,37], speaker recognition [38,39] and emotion recognition [40].…”
Section: Related Workmentioning
confidence: 99%
“…One advantage is that a large amount of data can be used even if they do not have target labels. This is a popular topic in speech processing community, and many models have been proposed: wav2vec [77], wav2vec2 [4], VQ-wav2vec [3], contrastive predictive coding [19], auto-regressive predictive coding [19], and HuBERT [36]. Readers are encouraged to check the related references and other papers in major conferences (e.g., ICASSP and Interspeech special sessions, and NISP workshop 6 ).…”
Section: Front End: Dnn-based Self-supervised Training Approachmentioning
confidence: 99%