2022
DOI: 10.1109/jstsp.2022.3207050
|View full text |Cite
|
Sign up to set email alerts
|

Self-Supervised Speech Representation Learning: A Review

Abstract: General rightsCopyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. Users may download and print one copy of any publication from the public portal for the purpose of private study or research.  You may not further distribute the material or use it for any profit-making activity or commer… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
36
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
2
2

Relationship

0
8

Authors

Journals

citations
Cited by 144 publications
(45 citation statements)
references
References 286 publications
0
36
0
Order By: Relevance
“…We measure the extent of acoustic, phonetic, and word content encoded in individual layers for 11 pre-trained models, using a lightweight analysis tool based on canonical correlation analysis (CCA). We find that phonetic and word information concentrates in different layers 1 The codebase will be made available at: https://github.com/ankitapasad/layerwise-analysis/ for different models, and the layer-wise trends relate to the pre-training objectives, despite differences in training data. We also report CCA measurements for a randomly initialized model and note that there is no inductive bias of the model architecture that affects the observed trends.…”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations
“…We measure the extent of acoustic, phonetic, and word content encoded in individual layers for 11 pre-trained models, using a lightweight analysis tool based on canonical correlation analysis (CCA). We find that phonetic and word information concentrates in different layers 1 The codebase will be made available at: https://github.com/ankitapasad/layerwise-analysis/ for different models, and the layer-wise trends relate to the pre-training objectives, despite differences in training data. We also report CCA measurements for a randomly initialized model and note that there is no inductive bias of the model architecture that affects the observed trends.…”
Section: Introductionmentioning
confidence: 99%
“…Self-supervised models have become a nearly ubiquitous approach for learning speech representations and improving performance on downstream tasks [1][2][3][4][5], but our understanding of their properties and strategies for their use is still limited. Some recent work has begun developing an understanding of the extent and location of different acoustic and linguistic information encoded by these models [6][7][8][9][10], which in some cases has resulted in improved fine-tuning strategies [8,9].…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Wave; JSTSP Feb. 2022 300-306 Ma, J., see Wang, H., JSTSP Feb. 2022 276-288 Ma, M., see 1519-1532 Ma, Z., see Guo, P., JSTSP June 2022 828-842 Ma, Z., see Huang, E., JSTSP June 2022 608-621 Maaloe, L., see Mohamed, A., JSTSP Oct. 20221179-1210 Malkin, E., see Ponomarchuk, A., JSTSP Feb. 2022 175-187 Mangia, M., see Martinini, F., JSTSP June 2022 713-724 Mao, S., Ji, M., Wang, B., Dai, Q., and Fang, L., Lee, H., Borgholt, L., Havtorn, J.D., Edin, J., Igel, C., Kirchhoff, K., Li, S., Livescu, K., Maaloe, L., Sainath, T.N., and Watanabe, S., Self-Supervised Speech Representation Learning: A Review; JSTSP Oct. 2022 1179-1210 Mohamed, A., see 1174-1178 Moreno, P., see Baskar, M.K., 1357-1366 Mori, K., see 1480-1492 Moritz, N., see Higuchi, Y., 1424-1438 Moro-Velazquez, L., see Cho, J., 1284-1295 Mu, X., see Xu, J., JSTSP JSTSP Jan. 2022 7-25 Poor, H.V., see Kim, K.J., JSTSP Jan. 2022 2-6 Poor, H.V., see Yang, H.H., JSTSP April 2022 406-419 Poor, H.V., see Wang, S., JSTSP April 2022 501-515 Poor, H.V., see Nguyen, K.K., JSTSP Aug. 1086Aug.…”
Section: Mk a Mathematical Modeling Of Covid-19 And Prediction Of Upc...mentioning
confidence: 99%
“…SSL pre-trains a shared representation model on a huge amount of unlabeled data. The pre-trained SSL model can be used for various downstream tasks with minimal adaptation via either finetuning or utilizing the learned representation from the frozen model [1]. Applying a SSL model to different downstream tasks can significantly lower the entry barrier for developing a model compared to training the model from scratch.…”
Section: Introductionmentioning
confidence: 99%