Interspeech 2021 2021
DOI: 10.21437/interspeech.2021-740
|View full text |Cite
|
Sign up to set email alerts
|

slimIPL: Language-Model-Free Iterative Pseudo-Labeling

Abstract: Recent results in end-to-end ASR have demonstrated the efficacy of simple pseudo-labeling for semisupervised models trained both with Connectionist Temporal Classification (CTC) and Sequenceto-Sequence (seq2seq) losses. Iterative Pseudo-Labeling (IPL), which continuously trains a single model using pseudo-labels iteratively re-generated as the model learns, has been shown to further increase performance in ASR. We improve upon the IPL algorithm: as the model learns, we propose to iteratively re-generate transc… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

3
30
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
7
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 26 publications
(34 citation statements)
references
References 76 publications
3
30
0
Order By: Relevance
“…We believe that this is due to overfitting since the fine-tuned model has over 300m parameters. This is in line with recent observations about overfitting in self-training for speech recognition (Likhomanenko et al, 2021).…”
Section: Self-training Strategiessupporting
confidence: 93%
“…We believe that this is due to overfitting since the fine-tuned model has over 300m parameters. This is in line with recent observations about overfitting in self-training for speech recognition (Likhomanenko et al, 2021).…”
Section: Self-training Strategiessupporting
confidence: 93%
“…Semi-supervised methods such as self-training, where a model is first trained on labeled data to annotate unlabeled speech, and then subsequently trained on combined golden and self-annotated label-speech pairs, are gaining popularity in the speech community and have yielded competitive results. For comparison, we also show performance from such methods (iterative pseudo labeling (IPL) [64], slimIPL [237], noisy student [63]), as well as the current state of the art-conformer XXL + noisy student [238]-which augments SSL with various advanced techniques including self-training. Results are visualized in black, grey, or orange to reflect the number of labeled utterances used during training (960, 100, or 10 hours).…”
Section: E Benchmark Results and Discussionmentioning
confidence: 99%
“…For simplicity, several SSL techniques are appended with suffixes B, L, XL, or XXL indicating the Base, Large, X-Large, or XX-Large variants specified in the original publication. We also include comparisons with semi-supervised, self-training approaches (iterative pseudo labeling (IPL)[64], slimIPL[237], noisy student[63]), as well as the current state of the art-conformer XXL + noisy student[238]-which combines self-training and SSL techniques. These approaches are visualized in black, grey, or orange to reflect the number of labeled utterances (960, 100, or 10 hours) used to train the systems.…”
mentioning
confidence: 99%
“…There has been extensive research on the utilization of unpaired data. For speech-only data, the common approach is unsupervised training that serves as a feature extractor for downstream ASR tasks [5,6,7,8], or self-training with pseudo-labels following a typical teacher-student training scheme [9,10]. For text-only data, text is mainly used to train an external language model (LM) for joint decoding [11,12,13,14,15].…”
Section: Introductionmentioning
confidence: 99%