Interspeech 2020 2020
DOI: 10.21437/interspeech.2020-2989
|View full text |Cite
|
Sign up to set email alerts
|

Sparse Mixture of Local Experts for Efficient Speech Enhancement

Abstract: In this paper, we investigate a deep learning approach for speech denoising through an efficient ensemble of specialist neural networks. By splitting up the speech denoising task into non-overlapping subproblems and introducing a classifier, we are able to improve denoising performance while also reducing computational complexity. More specifically, the proposed model incorporates a gating network which assigns noisy speech signals to an appropriate specialist network based on either speech degradation level o… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
7
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
2

Relationship

2
5

Authors

Journals

citations
Cited by 11 publications
(7 citation statements)
references
References 27 publications
0
7
0
Order By: Relevance
“…Denoting K to be the number of clusters, one can create K separate SE models trained only to denoise utterances from each disjoint group of similar speakers. As previously shown [12,13], a sparsely active ensemble model is capable of performing zero-shot adaptation because the gating module classifies the test-time noisy input signals into one-of-K groups.…”
Section: Ensemble Modelsmentioning
confidence: 99%
See 3 more Smart Citations
“…Denoting K to be the number of clusters, one can create K separate SE models trained only to denoise utterances from each disjoint group of similar speakers. As previously shown [12,13], a sparsely active ensemble model is capable of performing zero-shot adaptation because the gating module classifies the test-time noisy input signals into one-of-K groups.…”
Section: Ensemble Modelsmentioning
confidence: 99%
“…MLoE is also extended to recurrent neural network models to learn from the temporal structure of the speech enhancement problem in [11]. While in these methods a local expert is not dedicated to solve a specific sub-problem, in [12] MLoE-based SE system shows substantial improvement by pre-defining two separate partitioning schemes: based on the quality of input signal, i.e., in terms of signalto-noise ratio (SNR), and the gender of the speakers. Moreover, by introducing "sparseness" to the ensemble weights, it performs test-time inference on only one most suitable specialist.…”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations
“…However, ZSL for speech enhancement has not been widely studied. In [17], a mixture of local expert model was introduced as a ZSL solution to test-time adaptation of an SE model. It achieves the adaptation goal by employing a classifier to select the most suitable one out of pre-trained specialist models for a given noisy test signal.…”
Section: Introductionmentioning
confidence: 99%