2022
DOI: 10.1109/jstsp.2022.3188113
|View full text |Cite
|
Sign up to set email alerts
|

WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
143
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 365 publications
(158 citation statements)
references
References 79 publications
1
143
0
Order By: Relevance
“…Such pre-trained features can help to mitigate the small amount of overlapped speech data, because they were learnt using huge quantity of data. Our work is based on the new feature extractor created by Microsoft, named WavLM [19]. This system is a new self-supervised system built with transformer blocks trained on Mix94k, a corpus of 94k hours drawn from LibriLight, VoxPopuli and GigaSpeech.…”
Section: Pre-trained Features For Audio Segmentationmentioning
confidence: 99%
See 1 more Smart Citation
“…Such pre-trained features can help to mitigate the small amount of overlapped speech data, because they were learnt using huge quantity of data. Our work is based on the new feature extractor created by Microsoft, named WavLM [19]. This system is a new self-supervised system built with transformer blocks trained on Mix94k, a corpus of 94k hours drawn from LibriLight, VoxPopuli and GigaSpeech.…”
Section: Pre-trained Features For Audio Segmentationmentioning
confidence: 99%
“…WavLM: The second set of features is extracted with WavLM [19]. For OSD, we used the large version of WavLM 3 which returns 1024-dimension vectors per frame, without finetuning the model.…”
Section: Featuresmentioning
confidence: 99%
“…The field of unsupervised representation learning is established enough that it is no longer necessary to channel it through special events. Indeed, self-supervised audio models are such an active domain that there are many relevant new models (for example, WavLM: [125]) which have yet to be evaluated on the ZRC metrics. Existing benchmarks, especially for Tasks 2 and 4, also still have a lot of potential for improvement, without creating more difficult tasks.…”
Section: The Future Of the Zero Resource Speech Challengementioning
confidence: 99%
“…For deriving SSL embeddings, we make use of the following publicly available state-of-the-art (SOTA) pre-trained SSL systems like: Wav2Vec2 (Baevski et al, 2020), HuBERT (Hsu et al, 2021), and WavLM (Chen et al, 2021). These systems are among the top three performing networks for the SUPERB challenge (Yang et al, 2021), a SSL benchmark challenge for the speech processing tasks.…”
Section: Extracting Neural Embedding-based Fixed-length Feature Repre...mentioning
confidence: 99%