Multi-View Attention Transfer for Efficient Speech Enhancement

Shin, Woo-Seok; Park, Hyun Joon; Kim, Jin Sob; Lee, Byung‐Hoon; Han, Sung Won

doi:10.21437/interspeech.2022-10251

Cited by 5 publications

(1 citation statement)

References 26 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To alleviate this issue, [16] proposed aligning intermediate features, while [17] used attention maps to do so. The latter was applied in the context of SE in [18] using considerably large, non-causal student models intended for offline applications. In [19], the authors addressed the dimensionality mismatch problem for the causal SE models by using frame-level Similarity Preserving KD [20] (SPKD).…”

Section: Introductionmentioning

confidence: 99%

Sub-Band Knowledge Distillation Framework for Speech Enhancement

Hao

Wen²,

et al. 2020

Interspeech 2020

View full text Add to dashboard Cite

Tiny, causal models are crucial for embedded audio machine learning applications. Model compression can be achieved via distilling knowledge from a large teacher into a smaller student model. In this work, we propose a novel two-step approach for tiny speech enhancement model distillation. In contrast to the standard approach of a weighted mixture of distillation and supervised losses, we firstly pre-train the student using only the knowledge distillation (KD) objective, after which we switch to a fully supervised training regime. We also propose a novel fine-grained similarity-preserving KD loss, which aims to match the student's intra-activation Gram matrices to that of the teacher. Our method demonstrates broad improvements, but particularly shines in adverse conditions including high compression and low signal to noise ratios (SNR), yielding signal to distortion ratio gains of 0.9 dB and 1.1 dB, respectively, at -5 dB input SNR and 63× compression compared to baseline.

show abstract