ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2020
DOI: 10.1109/icassp40776.2020.9053578
|View full text |Cite
|
Sign up to set email alerts
|

Distilling Attention Weights for CTC-Based ASR Systems

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 11 publications
(7 citation statements)
references
References 17 publications
0
7
0
Order By: Relevance
“…As a result, we need a normalisation procedure, which makes the final loss function similar to MMI. There have been some other work of modifying CTC [13,14,15] but we have not work looking at the aspect of topology.…”
Section: Discussionmentioning
confidence: 99%
“…As a result, we need a normalisation procedure, which makes the final loss function similar to MMI. There have been some other work of modifying CTC [13,14,15] but we have not work looking at the aspect of topology.…”
Section: Discussionmentioning
confidence: 99%
“…Gao et al [16] use multiple teacher models to train a student ASR model to improve ASR accuracy jointly. Moriya et al [17] improve the accuracy by adding another term to the loss function called selfdistillation (SD), which comes from incorporating the teacher model.…”
Section: Classic Compression Methods In Language Modelingmentioning
confidence: 99%
“…Distillation between different decoder topologies has also been investigated. Moriya et al distilled knowledge from a teacher AED model to a student CTC model [79]. Self-distillation in a single E2E ASR model has also been proposed as an in-place operation, from an offline mode to a streaming mode [61], [68], and from a Transformer decoder to a CTC layer [80].…”
Section: Knowledge Distillation For Streaming Asrmentioning
confidence: 99%