Interspeech 2022 2022
DOI: 10.21437/interspeech.2022-429
|View full text |Cite
|
Sign up to set email alerts
|

Cross-Layer Similarity Knowledge Distillation for Speech Enhancement

Abstract: Speech enhancement (SE) algorithms based on deep neural networks (DNNs) often encounter challenges of limited hardware resources or strict latency requirements when deployed in realworld scenarios. However, a strong enhancement effect typically requires a large DNN. In this paper, a knowledge distillation framework for SE is proposed to compress the DNN model. We study the strategy of cross-layer connection paths, which fuses multi-level information from the teacher and transfers it to the student. To adapt to… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
6
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(6 citation statements)
references
References 22 publications
0
6
0
Order By: Relevance
“…Inspired by previous work [19,20], we address the issue of dimensionality mismatch between teacher and student models by computing similarity-based distillation losses. The method captures and compares the relationship between batch items at each layer output, between teacher and student (Fig.…”
Section: Self-similarity Local Knowledge Distillationmentioning
confidence: 99%
See 4 more Smart Citations
“…Inspired by previous work [19,20], we address the issue of dimensionality mismatch between teacher and student models by computing similarity-based distillation losses. The method captures and compares the relationship between batch items at each layer output, between teacher and student (Fig.…”
Section: Self-similarity Local Knowledge Distillationmentioning
confidence: 99%
“…The original implementation from [20] involves reshaping X to [b, ctf ] and matrix multiplying it by its transpose X T to obtain the [b, b] symmetric selfsimilarity matrix G. Analogously, this operation can be performed for each t or f dimension independently with resulting G t/f matrices of size [t/f , b, b]. Such an increase in granularity improved the KD performance in [19]. Here, we obtain even more detailed intraactivation Gram matrices by considering each (t, f ) bin separately, resulting in the G tf self-similarity matrix with shape [t, f , b, b].…”
Section: Self-similarity Local Knowledge Distillationmentioning
confidence: 99%
See 3 more Smart Citations