2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2014
DOI: 10.1109/icassp.2014.6854294
|View full text |Cite
|
Sign up to set email alerts
|

Single-channel speech separation with memory-enhanced recurrent neural networks

Abstract: In this paper we propose the use of Long Short-Term Memory recurrent neural networks for speech enhancement. Networks are trained to predict clean speech as well as noise features from noisy speech features, and a magnitude domain soft mask is constructed from these features. Extensive tests are run on 73 k noisy and reverberated utterances from the Audio-Visual Interest Corpus of spontaneous, emotionally colored speech, degraded by several hours of real noise recordings comprising stationary and non-stationar… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
66
0

Year Published

2015
2015
2021
2021

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 106 publications
(67 citation statements)
references
References 20 publications
(27 reference statements)
1
66
0
Order By: Relevance
“…Supervised learning of time-frequency masks for the noisy spectrum has been investigated in [11][12][13][14][15], using stereo training data in which noisy speech is the input, and a target time-frequency mask based on the corresponding clean speech data forms the output. Subsequent work [6] focused on modeling dynamics well using long short-term memory (LSTM) recurrent neural networks which helped achieve state of the art performance on a difficult task with nonstationary interference.…”
Section: Introductionmentioning
confidence: 99%
“…Supervised learning of time-frequency masks for the noisy spectrum has been investigated in [11][12][13][14][15], using stereo training data in which noisy speech is the input, and a target time-frequency mask based on the corresponding clean speech data forms the output. Subsequent work [6] focused on modeling dynamics well using long short-term memory (LSTM) recurrent neural networks which helped achieve state of the art performance on a difficult task with nonstationary interference.…”
Section: Introductionmentioning
confidence: 99%
“…II and the learning rates given in Table II. 5 value originally proposed in [48] and currently default in https://keras.io/. Figure 3 presents the waveforms of a specific 10 ms realization of clean, noisy, and enhanced speech signals from the experiments in Table I.…”
Section: B Signal Integrity Vs Performance Metricmentioning
confidence: 99%
“…"mask" corresponds to the first channel ofs in (8) and "filter" to y, the output of the GEVD-MWF in (4). The best result in each situation is shown in bold.…”
Section: Setupmentioning
confidence: 99%
“…DNNs were originally applied to single-channel inputs to derive a singlechannel filter, a.k.a. a mask [8][9][10]. In the multichannel case, several approaches have been proposed to pass spatial information directly to a DNN, for instance using phase difference features between non-coincident microphones [11] or coherence features [12].…”
Section: Introductionmentioning
confidence: 99%