2022
DOI: 10.48550/arxiv.2204.00771
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Fast Real-time Personalized Speech Enhancement: End-to-End Enhancement Network (E3Net) and Knowledge Distillation

Abstract: This paper investigates how to improve the runtime speed of personalized speech enhancement (PSE) networks while maintaining the model quality. Our approach includes two aspects: architecture and knowledge distillation (KD). We propose an end-to-end enhancement (E3Net) model architecture, which is 3× faster than a baseline STFT-based model. Besides, we use KD techniques to develop compressed student models without significantly degrading quality. In addition, we investigate using noisy data without reference c… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
3
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(4 citation statements)
references
References 23 publications
(44 reference statements)
0
3
0
Order By: Relevance
“…The results are presented in Table 12. For the baseline we implemented the method according to [49], and achieved results close to the original research in terms of speech clarity. As the evaluation scores were similar when the model size was set to 25%, we performed a two-tailed t-test between the results of fixed KD, Method C, and non-KD to assess their significance, as shown in Table 13 and Table 14.…”
Section: ) Comparison Of Cstr Vctk Datasetmentioning
confidence: 66%
“…The results are presented in Table 12. For the baseline we implemented the method according to [49], and achieved results close to the original research in terms of speech clarity. As the evaluation scores were similar when the model size was set to 25%, we performed a two-tailed t-test between the results of fixed KD, Method C, and non-KD to assess their significance, as shown in Table 13 and Table 14.…”
Section: ) Comparison Of Cstr Vctk Datasetmentioning
confidence: 66%
“…Knowledge distillation (KD) [ 94 ], known also as teacher–student training, refers to training small DNN models by supervisions generated by computationally demanding teacher models. Low-cost E3Net [ 95 ] also use KD to leverage unpaired noisy samples. E3Net outperformed earlier networks proposed by authors with a three times reduction in computational cost.…”
Section: Techniques For the Reduction In Computational And Memory Req...mentioning
confidence: 99%
“…It was reported that tensor decomposition gives better STOI than pruning for the same compression rate. KD can lead to reductions in the number of computations (2-4 times), with slight degradation in quality metrics [ 95 ]. It was not extensively tested for speech enhancement.…”
Section: Techniques For the Reduction In Computational And Memory Req...mentioning
confidence: 99%
“…Personalization has shown promising results in model compression tasks for speech enhancement [15,16,17,18]. A personalized model adapts to the target speaker group's speech trait, narrowing the training task down to a smaller subtask, i.e., defined by the smaller speaker group than the entire speakers in the corpus.…”
Section: Introductionmentioning
confidence: 99%