2020
DOI: 10.48550/arxiv.2008.03822
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Distilling the Knowledge of BERT for Sequence-to-Sequence ASR

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
8

Relationship

0
8

Authors

Journals

citations
Cited by 8 publications
(7 citation statements)
references
References 22 publications
0
7
0
Order By: Relevance
“…Previous works like shallow fusion [24] and cold fusion [25] aim to combine an auto-regressive LM with a S2S ASR model, which is randomly initialized. To leverage the power of pretrained LM like BERT, many works concentrate on knowledge distillation of BERT for ASR [26,27], while the progress is limited [28]. More recently, some studies aim to integrate the BERT into a non-autoregressive ASR models [28,6].…”
Section: Related Workmentioning
confidence: 99%
“…Previous works like shallow fusion [24] and cold fusion [25] aim to combine an auto-regressive LM with a S2S ASR model, which is randomly initialized. To leverage the power of pretrained LM like BERT, many works concentrate on knowledge distillation of BERT for ASR [26,27], while the progress is limited [28]. More recently, some studies aim to integrate the BERT into a non-autoregressive ASR models [28,6].…”
Section: Related Workmentioning
confidence: 99%
“…To use the linguistic information from BERT (Devlin et al, 2018) for improving ASR performance, some works (Chiu and Chen, 2021;Shin et al, 2019;Wang and Cho, 2019) use BERT to rerank the N-best hypotheses generated by the ASR model. Besides, knowledge distillation (Futami et al, 2020) (Shin et al, 2019). (b) Cascade methods directly cascade the BERT decoder on the top of the wav2vec 2.0 encoder through Length Alignment module .…”
Section: Speech Recognition With Bertmentioning
confidence: 99%
“…The original usage of BERT mainly focused on NLP tasks, ranging from token-level and sequence-level classification tasks, including question answering [9,10], document summarization [11,12], information retrieval [13,14], machine translation [15,16], just to name a few. There has also been attempts to combine BERT in ASR, including rescoring [17,18] or generating soft labels for training [19]. In this section, we review the fundamentals of BERT.…”
Section: Bertmentioning
confidence: 99%