Sub-Band Knowledge Distillation Framework for Speech Enhancement

Hao, Xiang; Wen, Shixue; Su, Xiangdong; Liu, Yun; Gao, Guanglai; Liu, Xiaofei

doi:10.21437/interspeech.2020-1539

Cited by 17 publications

(4 citation statements)

References 42 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, experiments conducted in [21] revealed that the teacher model can only provide assistance once the knowledge has been refined, regardless of whether the teacher model has more or fewer parameters than the student model. In recent years, there have been studies exploring the application of KD in the field of SE [22]- [29]. While KD has shown great success in image classification tasks, several variants, such as feature mimic [30], [31] and self-KD [32], [33], have been derived and proven effective.…”

Section: Introductionmentioning

confidence: 99%

Speech Enhancement Using Dynamic Learning in Knowledge Distillation via Reinforcement Learning

Chu,

Wu,

2023

IEEE Access

View full text Add to dashboard Cite

In recent years, most of the research on speech enhancement (SE) has applied different strategies to improve performance through deep neural network models. However, as the performance improves, the memory resources and computational requirements of the model also increase, making it difficult to directly apply them to edge computing. Therefore, various model compression and acceleration techniques are desired. This paper proposes a learning method that dynamically uses Knowledge Distillation (KD) to teach a small student model from a large teacher model by considering the learning ratio from the teacher's output and the real target based on reinforcement learning (RL). During the KD learning process, RL is adopted to estimate the learning ratio by considering the reward favoring the hard target (clean speech) or the soft target (the output of the teacher model) during the training of KD. The proposed method results in a more stable training process for the resulting smaller SE model and yields improved performance. In the experiment, we used the TIMIT and CSTR VCTK datasets and evaluated two representative SE models that employ different loss functions. On the TIMIT dataset, when we reduced the number of parameters in the Wave-U-Net student model from 10.3 million to 2.6 million, our method performed better than non-KD models with improvements of 0.05 in PESQ, 0.1 in STOI, and 0.47 in the scale-invariant signal-to-distortion ratio. Moreover, by utilizing prior knowledge from the pre-trained teacher model, our method effectively guided the learning process of the student model, achieving excellent performance even under low SNR conditions. Furthermore, we use Conv-Tasnet to further validate our proposed method. Finally, for ease of comparison, we conducted a comparison on the VCTK dataset as well.

show abstract

Section: Introductionmentioning

confidence: 99%

Speech Enhancement Using Dynamic Learning in Knowledge Distillation via Reinforcement Learning

Chu,

Wu,

2023

IEEE Access

View full text Add to dashboard Cite

show abstract

“…A low-latency online extension of wave-U-net was proposed in [15], which directly reduces the difference between the teacher and student output. Teacher-student learning was used in [16] to train a general subband enhancement model. However, these methods did not study the intermediate representation of the DNN model.…”

Section: Introductionmentioning

confidence: 99%

Cross-Layer Similarity Knowledge Distillation for Speech Enhancement

Cheng¹,

Liang²,

Xie³

et al. 2022

Interspeech 2022

View full text Add to dashboard Cite

Speech enhancement (SE) algorithms based on deep neural networks (DNNs) often encounter challenges of limited hardware resources or strict latency requirements when deployed in realworld scenarios. However, a strong enhancement effect typically requires a large DNN. In this paper, a knowledge distillation framework for SE is proposed to compress the DNN model. We study the strategy of cross-layer connection paths, which fuses multi-level information from the teacher and transfers it to the student. To adapt to the SE task, we propose a frame-level similarity distillation loss. We apply this method to the deep complex convolution recurrent network (DCCRN) and make targeted adjustments. Experimental results show that the proposed method considerably improves the enhancement effect of the compressed DNN and outperforms other distillation methods.

show abstract

“…Quantization reduces the bit width of weights and operators for faster inference while pruning removes the less important weights for less resource usage. Knowledge distillation method [17] trains a small network under the supervision of a larger network, and this has also been used in SE [18]. All these methods are applied at training stage and cannot be utilized to dynamically decrease the computational load according to the characteristics of input data during inference.…”

Section: Introductionmentioning

confidence: 99%

Inference skipping for more efficient real-time speech enhancement with parallel RNNs

Le,

Lei,

Chen

et al. 2022

Preprint

View full text Add to dashboard Cite

Deep neural network (DNN) based speech enhancement models have attracted extensive attention due to their promising performance. However, it is difficult to deploy a powerful DNN in real-time applications because of its high computational cost. Typical compression methods such as pruning and quantization do not make good use of the data characteristics. In this paper, we introduce the Skip-RNN strategy into speech enhancement models with parallel RNNs. The states of the RNNs update intermittently without interrupting the update of the output mask, which leads to significant reduction of computational load without evident audio artifacts. To better leverage the difference between the voice and the noise, we further regularize the skipping strategy with voice activity detection (VAD) guidance, saving more computational load. Experiments on a high-performance speech enhancement model, dual-path convolutional recurrent network (DPCRN), show the superiority of our strategy over strategies like network pruning or directly training a smaller model. We also validate the generalization of the proposed strategy on two other competitive speech enhancement models.

show abstract

Sub-Band Knowledge Distillation Framework for Speech Enhancement

Cited by 17 publications

References 42 publications

Speech Enhancement Using Dynamic Learning in Knowledge Distillation via Reinforcement Learning

Speech Enhancement Using Dynamic Learning in Knowledge Distillation via Reinforcement Learning

Cross-Layer Similarity Knowledge Distillation for Speech Enhancement

Inference skipping for more efficient real-time speech enhancement with parallel RNNs

Contact Info

Product

Resources

About