General Sequence Teacher–Student Learning

Wong, Jeremy H. M.; Gales, Mark J. F.; Wang, Yu

doi:10.1109/taslp.2019.2929859

Cited by 7 publications

(4 citation statements)

References 28 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…where X is an input sequence, θT is the teacher, θS is the student, and H is a hypothesis, which may be expressed as a sequence of words, sub-word units, or states [27]. This study considers state sequences.…”

Section: Boosting Accuracy With Sequence-level Teacher-student Learningmentioning

confidence: 99%

“…where α k is the combination weight for the kth teacher, θT k , satisfying k α k = 1 and α k ≥ 0. The contribution to the T/S gradient from the teachers can be obtained by performing a separate forwardbackward operation over each of the teachers' lattices, which represent each teachers' hypotheses [27]. This is computationally expensive, especially when using a large amount of training data.…”

Section: Boosting Accuracy With Sequence-level Teacher-student Learningmentioning

confidence: 99%

“…The second direction is to improve upon the hybrid model training strategy. Specifically, on top of sequence discriminative training, we further improve the model accuracy by performing sequence-level teacher-student (T/S) learning [26,27] toward a strong ensemble teacher.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

High-Accuracy and Low-Latency Speech Recognition with Two-Head Contextual Layer Trajectory LSTM Model

Zhao

Sun

et al. 2020

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

While the community keeps promoting end-to-end models over conventional hybrid models, which usually are long short-term memory (LSTM) models trained with a cross entropy criterion followed by a sequence discriminative training criterion, we argue that such conventional hybrid models can still be significantly improved. In this paper, we detail our recent efforts to improve conventional hybrid LSTM acoustic models for high-accuracy and low-latency automatic speech recognition. To achieve high accuracy, we use a contextual layer trajectory LSTM (cltLSTM), which decouples the temporal modeling and target classification tasks, and incorporates future context frames to get more information for accurate acoustic modeling. We further improve the training strategy with sequencelevel teacher-student learning. To obtain low latency, we design a two-head cltLSTM, in which one head has zero latency and the other head has a small latency, compared to an LSTM. When trained with Microsoft's 65 thousand hours of anonymized training data and evaluated with test sets with 1.8 million words, the proposed twohead cltLSTM model with the proposed training strategy yields a 28.2% relative WER reduction over the conventional LSTM acoustic model, with a similar perceived latency.

show abstract

Section: Boosting Accuracy With Sequence-level Teacher-student Learningmentioning

confidence: 99%

Section: Boosting Accuracy With Sequence-level Teacher-student Learningmentioning

confidence: 99%

See 1 more Smart Citation

High-Accuracy and Low-Latency Speech Recognition with Two-Head Contextual Layer Trajectory LSTM Model

Zhao

Sun

et al. 2020

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

show abstract

“…Wu et al [29] proposed a multi-teacher knowledge distillation framework to compress the model by transferring the knowledge from multiple teachers to a single student model. Wong et al [30] proposed to train the student by an ensemble with a diversity of state cluster sets. Simultaneous distillation algorithms [31] trained simultaneously a set of models that learn from each other in a peer-teaching manner.…”

Section: Related Workmentioning

confidence: 99%

Mutual-learning sequence-level knowledge distillation for automatic speech recognition

Ming

Lei

et al. 2021

Neurocomputing

View full text Add to dashboard Cite

Semi-supervised and un-supervised clustering: A review and experimental evaluation

Taha

2023

Information Systems

View full text Add to dashboard Cite

General Sequence Teacher–Student Learning

Cited by 7 publications

References 28 publications

High-Accuracy and Low-Latency Speech Recognition with Two-Head Contextual Layer Trajectory LSTM Model

High-Accuracy and Low-Latency Speech Recognition with Two-Head Contextual Layer Trajectory LSTM Model

Mutual-learning sequence-level knowledge distillation for automatic speech recognition

Semi-supervised and un-supervised clustering: A review and experimental evaluation

Contact Info

Product

Resources

About