Deep neural networks for single channel source separation

Grais, Emad M.; Şen, Mehmet Umut; Erdoğan, Hakan

doi:10.1109/icassp.2014.6854299

Cited by 99 publications

(75 citation statements)

References 22 publications

(30 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…speech recognition [2] and speech synthesis [3]. More recently, the DNN has been applied to speech separation [4]- [7] and enhancement/denoising [8]- [10], particularly for monaural recordings [4]- [6], [8]- [10]. When processing mixtures of target speech signals and competing noise, speech separation may be considered as speech enhancement.…”

Section: Introductionmentioning

confidence: 99%

“…The inputs to the DNN are often (hybrid) features such as timefrequency (TF) domain spectral features [4]- [6], [8]- [10] and filterbank features [4], [5], [11]; while the output can be the TF unit level features that can be used to recover the speech source, such as ideal binary/ratio masks (IBM/IRM) [4]- [6], [11], direct magnitude spectra [9], [10] or their transforms such as log power (LP) spectra [8].…”

Section: Introductionmentioning

confidence: 99%

“…In order to recover the underlying target speech embedded in noise, most of the deep neural networks, either recurrent [4], [5], [10] or feedforward [4], [6], [8], [9], [11], are trained to optimize some objective functions such as the mean squared error (MSE) between the true and predicted outputs. The inputs to the DNN are often (hybrid) features such as timefrequency (TF) domain spectral features [4]- [6], [8]- [10] and filterbank features [4], [5], [11]; while the output can be the TF unit level features that can be used to recover the speech source, such as ideal binary/ratio masks (IBM/IRM) [4]- [6], [11], direct magnitude spectra [9], [10] or their transforms such as log power (LP) spectra [8].…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

A perceptually-weighted deep neural network for monaural speech enhancement in various background noise conditions

Liu

Wang

Jackson

et al. 2017

2017 25th European Signal Processing Conference (EUSIPCO)

View full text Add to dashboard Cite

Abstract-Deep neural networks (DNN) have recently been shown to give state-of-the-art performance in monaural speech enhancement. However in the DNN training process, the perceptual difference between different components of the DNN output is not fully exploited, where equal importance is often assumed. To address this limitation, we have proposed a new perceptually-weighted objective function within a feedforward DNN framework, aiming to minimize the perceptual difference between the enhanced speech and the target speech. A perceptual weight is integrated into the proposed objective function, and has been tested on two types of output features: spectra and ideal ratio masks. Objective evaluations for both speech quality and speech intelligibility have been performed. Integration of our perceptual weight shows consistent improvement on several noise levels and a variety of different noise types.

show abstract

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

A perceptually-weighted deep neural network for monaural speech enhancement in various background noise conditions

Liu

Wang

Jackson

et al. 2017

2017 25th European Signal Processing Conference (EUSIPCO)

View full text Add to dashboard Cite

show abstract

“…Many SCBSS solutions in various approaches have been proposed but the dominant approach is the non-negative matrix factorization (NMF) [4][5][6][7] and its variants. [8][9][10][11][12][13][14][15] In this paper, we introduced a new method based on complex non-negative matrix factorization (CMF).…”

Section: Introductionmentioning

confidence: 99%

Single-channel blind separation using L1-sparse complex non-negative matrix factorization for acoustic signals

Parathai¹,

Woo²,

Dlay³

et al. 2015

The Journal of the Acoustical Society of America

View full text Add to dashboard Cite

An innovative method of single-channel blind source separation is proposed. The proposed method is a complex-valued non-negative matrix factorization with probabilistically optimal L1-norm sparsity. This preserves the phase information of the source signals and enforces the inherent structures of the temporal codes to be optimally sparse, thus resulting in more meaningful parts factorization. An efficient algorithm with closed-form expression to compute the parameters of the model including the sparsity has been developed. Real-time acoustic mixtures recorded from a single-channel are used to verify the effectiveness of the proposed method.

show abstract

“…INTRODUCTION Much work in audio source separation has been inspired by the ability of human listeners to maintain separate auditory neural and perceptual representations of competing speech in 'cocktail party' listening scenarios [1]- [3]. A common engineering approach is to decompose a mixed audio signal, comprising two or more competing speech signals, into a spectrogram in order to assign each time-frequency element to the respective sources [4]- [6]. Hence, this form of source separation may be interpreted as a classification problem.…”

mentioning

confidence: 99%

Deep Karaoke: Extracting Vocals from Musical Mixtures Using a Convolutional Deep Neural Network

Simpson

Roma

Plumbley

2015

Latent Variable Analysis and Signal Separation

View full text Add to dashboard Cite

Abstract-Identification and extraction of singing voice from within musical mixtures is a key challenge in source separation and machine audition. Recently, deep neural networks (DNN) have been used to estimate 'ideal' binary masks for carefully controlled cocktail party speech separation problems. However, it is not yet known whether these methods are capable of generalizing to the discrimination of voice and non-voice in the context of musical mixtures. Here, we trained a convolutional DNN (of around a billion parameters) to provide probabilistic estimates of the ideal binary mask for separation of vocal sounds from real-world musical mixtures. We contrast our DNN results with more traditional linear methods. Our approach may be useful for automatic removal of vocal sounds from musical mixtures for 'karaoke' type applications.

show abstract

Deep neural networks for single channel source separation

Cited by 99 publications

References 22 publications

A perceptually-weighted deep neural network for monaural speech enhancement in various background noise conditions

A perceptually-weighted deep neural network for monaural speech enhancement in various background noise conditions

Single-channel blind separation using L1-sparse complex non-negative matrix factorization for acoustic signals

Deep Karaoke: Extracting Vocals from Musical Mixtures Using a Convolutional Deep Neural Network

Contact Info

Product

Resources

About