ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
DOI: 10.1109/icassp39728.2021.9414619
|View full text |Cite
|
Sign up to set email alerts
|

Two-Stage Textual Knowledge Distillation for End-to-End Spoken Language Understanding

Abstract: End-to-end approaches open a new way for more accurate and efficient spoken language understanding (SLU) systems by alleviating the drawbacks of traditional pipeline systems. Previous works exploit textual information for an SLU model via pre-training with automatic speech recognition or finetuning with knowledge distillation. To utilize textual information more effectively, this work proposes a two-stage textual knowledge distillation method that matches utterancelevel representations and predicted logits of … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(1 citation statement)
references
References 18 publications
0
1
0
Order By: Relevance
“…By simultaneously optimizing text and image generation models, they improved the quality and consistency of image generation. Kim et al introduced a knowledge distillation method from speech to text, named Speech2Text Distillation [35] , leveraging pretrained speech recognition models to enhance text generation models. They significantly improved the performance of speech-to-text tasks through cross-modal distillation.…”
Section: Cross-modal Distillationmentioning
confidence: 99%
“…By simultaneously optimizing text and image generation models, they improved the quality and consistency of image generation. Kim et al introduced a knowledge distillation method from speech to text, named Speech2Text Distillation [35] , leveraging pretrained speech recognition models to enhance text generation models. They significantly improved the performance of speech-to-text tasks through cross-modal distillation.…”
Section: Cross-modal Distillationmentioning
confidence: 99%