Interspeech 2020 2020
DOI: 10.21437/interspeech.2020-1539
|View full text |Cite
|
Sign up to set email alerts
|

Sub-Band Knowledge Distillation Framework for Speech Enhancement

Abstract: Tiny, causal models are crucial for embedded audio machine learning applications. Model compression can be achieved via distilling knowledge from a large teacher into a smaller student model. In this work, we propose a novel two-step approach for tiny speech enhancement model distillation. In contrast to the standard approach of a weighted mixture of distillation and supervised losses, we firstly pre-train the student using only the knowledge distillation (KD) objective, after which we switch to a fully superv… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
4

Citation Types

0
3
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 17 publications
(4 citation statements)
references
References 42 publications
0
3
0
Order By: Relevance
“…However, experiments conducted in [21] revealed that the teacher model can only provide assistance once the knowledge has been refined, regardless of whether the teacher model has more or fewer parameters than the student model. In recent years, there have been studies exploring the application of KD in the field of SE [22]- [29]. While KD has shown great success in image classification tasks, several variants, such as feature mimic [30], [31] and self-KD [32], [33], have been derived and proven effective.…”
Section: Introductionmentioning
confidence: 99%
“…However, experiments conducted in [21] revealed that the teacher model can only provide assistance once the knowledge has been refined, regardless of whether the teacher model has more or fewer parameters than the student model. In recent years, there have been studies exploring the application of KD in the field of SE [22]- [29]. While KD has shown great success in image classification tasks, several variants, such as feature mimic [30], [31] and self-KD [32], [33], have been derived and proven effective.…”
Section: Introductionmentioning
confidence: 99%
“…A low-latency online extension of wave-U-net was proposed in [15], which directly reduces the difference between the teacher and student output. Teacher-student learning was used in [16] to train a general subband enhancement model. However, these methods did not study the intermediate representation of the DNN model.…”
Section: Introductionmentioning
confidence: 99%
“…Quantization reduces the bit width of weights and operators for faster inference while pruning removes the less important weights for less resource usage. Knowledge distillation method [17] trains a small network under the supervision of a larger network, and this has also been used in SE [18]. All these methods are applied at training stage and cannot be utilized to dynamically decrease the computational load according to the characteristics of input data during inference.…”
Section: Introductionmentioning
confidence: 99%