2016
DOI: 10.1109/taslp.2016.2536478
|View full text |Cite
|
Sign up to set email alerts
|

A Deep Ensemble Learning Method for Monaural Speech Separation

Abstract: Monaural speech separation is a fundamental problem in robust speech processing. Recently, deep neural network (DNN)-based speech separation methods, which predict either clean speech or an ideal time-frequency mask, have demonstrated remarkable performance improvement. However, a single DNN with a given window length does not leverage contextual information sufficiently, and the differences between the two optimization objectives are not well understood. In this paper, we propose a deep ensemble method, named… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

2
102
0

Year Published

2016
2016
2021
2021

Publication Types

Select...
5
3
2

Relationship

1
9

Authors

Journals

citations
Cited by 205 publications
(107 citation statements)
references
References 35 publications
2
102
0
Order By: Relevance
“…Recently deep learning has been employed to address speaker separation. The general idea is to train a deep neural network (DNN) to predict T-F masks or spectra of two speakers in a mixture [7] [16] [42]. There are usually two output layers in such a DNN, one for an individual speaker.…”
Section: Introductionmentioning
confidence: 99%
“…Recently deep learning has been employed to address speaker separation. The general idea is to train a deep neural network (DNN) to predict T-F masks or spectra of two speakers in a mixture [7] [16] [42]. There are usually two output layers in such a DNN, one for an individual speaker.…”
Section: Introductionmentioning
confidence: 99%
“…Lately, there has been increasing interest in nonlinear models, specifically, Deep Neural Networks (DNNs) [21,22,23,24]. In Deep Clustering (DPCL) [25,26], first, the timefrequency bins of the mixtures are mapped into an embedding space; then, a clustering algorithm is performed in the embedding space; finally, a binary mask is generated based on each cluster to reconstruct speech of each speaker.…”
Section: Introductionmentioning
confidence: 99%
“…To enhance the accuracy of weather prediction, Williams, Neilley, Koval, and McDonald () incorporated spatial‐temporal neighborhood bias information and used it to formulate a constraint‐regularized regression problem. Zhang and Wang () presented a multi‐context network, with one network averaging the output of multiple DNNs and the other stacking them together. Guzman, El‐Haliby, and Bruegge () compared the performance of four machine learning methods and their ensembles in classifying app reviews.…”
Section: Literature Reviewmentioning
confidence: 99%