The 9th International Symposium on Chinese Spoken Language Processing 2014
DOI: 10.1109/iscslp.2014.6936615
|View full text |Cite
|
Sign up to set email alerts
|

Speech separation based on improved deep neural networks with dual outputs of speech features for both target and interfering speakers

Abstract: In this paper, a novel deep neural network (DNN) architecture is proposed to generate the speech features of both the target speaker and interferer for speech separation. DNN is adopted here to directly model the highly nonlinear relationship between speech features of the mixed signals and the two competing speakers. With the modified output speech features for learning the parameters of the DNN, the generalization capacity to unseen interferers is improved for separating the target speech. Meanwhile, without… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
32
0

Year Published

2015
2015
2022
2022

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 56 publications
(32 citation statements)
references
References 19 publications
0
32
0
Order By: Relevance
“…Contrarily to [2,9], where DNNs are trained of raw spectral features, we train the DNN on SNMF activation coefficients. Hence, to evaluate the influence of the input features of the DNN, we introduce a variant of our framework denoted (DNN-SNMF-Spec), where the DNN is learned on spectral features to predict activation coefficients, and uses the modified cost function computed on signal reconstruction.…”
Section: Methodsmentioning
confidence: 99%
See 4 more Smart Citations
“…Contrarily to [2,9], where DNNs are trained of raw spectral features, we train the DNN on SNMF activation coefficients. Hence, to evaluate the influence of the input features of the DNN, we introduce a variant of our framework denoted (DNN-SNMF-Spec), where the DNN is learned on spectral features to predict activation coefficients, and uses the modified cost function computed on signal reconstruction.…”
Section: Methodsmentioning
confidence: 99%
“…The activation coefficients are then used as input features of the DNN, instead of raw spectral coefficients as in [9] or the log spectrum in [2]. For each frame of noisy speech (at index position t), we build a large vector composed of the concatenation of the activation coefficients of speechĥ S,t and noisê h N,t vectors extracted on each frame on an analysis windows of width (2K + 1) frames centred on the t th frame.…”
Section: Feature Extraction Using Supervised Snmfmentioning
confidence: 99%
See 3 more Smart Citations