2014 IEEE Global Conference on Signal and Information Processing (GlobalSIP) 2014
DOI: 10.1109/globalsip.2014.7032183
|View full text |Cite
|
Sign up to set email alerts
|

Discriminatively trained recurrent neural networks for single-channel speech separation

Abstract: This paper describes an in-depth investigation of training criteria, network architectures and feature representations for regression-based single-channel speech separation with deep neural networks (DNNs). We use a generic discriminative training criterion corresponding to optimal source reconstruction from time-frequency masks, and introduce its application to speech separation in a reduced feature space (Mel domain). A comparative evaluation of time-frequency mask estimation by DNNs, recurrent DNNs and non-… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

2
266
1

Year Published

2015
2015
2020
2020

Publication Types

Select...
5
1

Relationship

1
5

Authors

Journals

citations
Cited by 249 publications
(269 citation statements)
references
References 20 publications
2
266
1
Order By: Relevance
“…There, only the gradient ∂E SA /∂m of the objective function with respect to the network output is specific to source separation, whereas the rest of the algorithm is unchanged. Using L instead of conventional sigmoid or half-wave activation functions helps reducing the vanishing temporal gradient problem of RNNs [5], allowing them to outperform DNNs with static context windows in speech enhancement [16].…”
Section: Speech Enhancement Methodsmentioning
confidence: 99%
See 3 more Smart Citations
“…There, only the gradient ∂E SA /∂m of the objective function with respect to the network output is specific to source separation, whereas the rest of the algorithm is unchanged. Using L instead of conventional sigmoid or half-wave activation functions helps reducing the vanishing temporal gradient problem of RNNs [5], allowing them to outperform DNNs with static context windows in speech enhancement [16].…”
Section: Speech Enhancement Methodsmentioning
confidence: 99%
“…Here, we consider deep recurrent neural networks (DRNNs). as proposed in [16]. The maskm t is estimated by the DRNN forward pass, which is defined as follows, for hidden layers k = 1, .…”
Section: Speech Enhancement Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…Previous studies have focused on either developing improved feature extraction methods or using more sophisticated classifiers -for example moving from Gaussian mixture models (GMMs) to deep neural networks (DNNs). Some attention has been focussed on improving the classifier to reduce perceptual error by changing the loss function for text-to-speech applications [14], and introducing signal approximation loss functions [15,16] as a replacement for mask approximation within speech separation applications. Signal approximation loss functions apply the output of the network to the noisy spectrum within the loss function, and minimise this with respect to the target.…”
Section: Introductionmentioning
confidence: 99%