2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2014
DOI: 10.1109/icassp.2014.6854299
|View full text |Cite
|
Sign up to set email alerts
|

Deep neural networks for single channel source separation

Abstract: In this paper, a novel approach for single channel source separation (SCSS) using a deep neural network (DNN) architecture is introduced. Unlike previous studies in which DNN and other classifiers were used for classifying time-frequency bins to obtain hard masks for each source, we use the DNN to classify estimated source spectra to check for their validity during separation. In the training stage, the training data for the source signals are used to train a DNN. In the separation stage, the trained DNN is ut… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4

Citation Types

0
75
0

Year Published

2015
2015
2021
2021

Publication Types

Select...
5
1
1

Relationship

1
6

Authors

Journals

citations
Cited by 99 publications
(75 citation statements)
references
References 22 publications
(30 reference statements)
0
75
0
Order By: Relevance
“…speech recognition [2] and speech synthesis [3]. More recently, the DNN has been applied to speech separation [4]- [7] and enhancement/denoising [8]- [10], particularly for monaural recordings [4]- [6], [8]- [10]. When processing mixtures of target speech signals and competing noise, speech separation may be considered as speech enhancement.…”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations
“…speech recognition [2] and speech synthesis [3]. More recently, the DNN has been applied to speech separation [4]- [7] and enhancement/denoising [8]- [10], particularly for monaural recordings [4]- [6], [8]- [10]. When processing mixtures of target speech signals and competing noise, speech separation may be considered as speech enhancement.…”
Section: Introductionmentioning
confidence: 99%
“…The inputs to the DNN are often (hybrid) features such as timefrequency (TF) domain spectral features [4]- [6], [8]- [10] and filterbank features [4], [5], [11]; while the output can be the TF unit level features that can be used to recover the speech source, such as ideal binary/ratio masks (IBM/IRM) [4]- [6], [11], direct magnitude spectra [9], [10] or their transforms such as log power (LP) spectra [8].…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Many SCBSS solutions in various approaches have been proposed but the dominant approach is the non-negative matrix factorization (NMF) [4][5][6][7] and its variants. [8][9][10][11][12][13][14][15] In this paper, we introduced a new method based on complex non-negative matrix factorization (CMF).…”
Section: Introductionmentioning
confidence: 99%
“…INTRODUCTION Much work in audio source separation has been inspired by the ability of human listeners to maintain separate auditory neural and perceptual representations of competing speech in 'cocktail party' listening scenarios [1]- [3]. A common engineering approach is to decompose a mixed audio signal, comprising two or more competing speech signals, into a spectrogram in order to assign each time-frequency element to the respective sources [4]- [6]. Hence, this form of source separation may be interpreted as a classification problem.…”
mentioning
confidence: 99%