Acoustic Model Bootstrapping Using Semi-Supervised Learning

Chen, Langzhou; Leutnant, Volker

doi:10.21437/interspeech.2019-2818

Cited by 5 publications

(2 citation statements)

References 13 publications

(23 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We demonstrate that a step-wise distillation approach, introduced in [23] can be effective, although this comes at the cost of more computation at training time. In low data regimes, SSL is an effective technique to reduce annotation costs [14,17,21,5]. For our second task, using knowledge distillation for SSL, we find that to achieve a performance comparable to that of a fully supervised system, the proportion of required supervised data decreases as the amount of total data increases.…”

Section: Introductionmentioning

confidence: 95%

Exploiting Large-scale Teacher-Student Training for On-device Acoustic Models

Liu¹,

Swaminathan²,

Parthasarathi³

et al. 2021

Preprint

View full text Add to dashboard Cite

We present results from Alexa speech teams on semi-supervised learning (SSL) of acoustic models (AM) with experiments spanning over 3000 hours of GPU time, making our study one of the largest of its kind. We discuss SSL for AMs in a small footprint setting, showing that a smaller capacity model trained with 1 million hours of unsupervised data can outperform a baseline supervised system by 14.3% word error rate reduction (WERR). When increasing the supervised data to seven-fold, our gains diminish to 7.1% WERR; to improve SSL efficiency at larger supervised data regimes, we employ a step-wise distillation into a smaller model, obtaining a WERR of 14.4%. We then switch to SSL using larger student models in low data regimes; while learning efficiency with unsupervised data is higher, student models may outperform teacher models in such a setting. We develop a theoretical sketch to explain this behavior.

show abstract

Section: Introductionmentioning

confidence: 95%

Exploiting Large-scale Teacher-Student Training for On-device Acoustic Models

Liu¹,

Swaminathan²,

Parthasarathi³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…• teacher-student learning if non-transcribed data is available in the target domain [19,20,21,22,23,24,25].…”

Section: Introductionmentioning

confidence: 99%

Data Techniques For Online End-to-end Speech Recognition

Chen,

Wang,

Chen

et al. 2020

Preprint

View full text Add to dashboard Cite

Practitioners often need to build ASR systems for new use cases in a short amount of time, given limited in-domain data. While recently developed end-to-end methods largely simplify the modeling pipelines, they still suffer from the data sparsity issue. In this work, we explore a few simple-toimplement techniques for building online ASR systems in an end-to-end fashion, with a small amount of transcribed data in the target domain. These techniques include data augmentation in the target domain, domain adaptation using models previously trained on a large source domain, and knowledge distillation on non-transcribed target domain data; they are applicable in real scenarios with different types of resources. Our experiments demonstrate that each technique is independently useful in the low-resource setting, and combining them yields significant improvement of the online ASR performance in the target domain.

show abstract