The platform will undergo maintenance on Sep 14 at about 7:45 AM EST and will be unavailable for approximately 2 hours.
2017
DOI: 10.1109/taslp.2017.2696307
|View full text |Cite
|
Sign up to set email alerts
|

Time-Frequency Masking in the Complex Domain for Speech Dereverberation and Denoising

Abstract: In real-world situations, speech is masked by both background noise and reverberation, which negatively affect perceptual quality and intelligibility. In this paper, we address monaural speech separation in reverberant and noisy environments. We perform dereverberation and denoising using supervised learning with a deep neural network. Specifically, we enhance the magnitude and phase by performing separation with an estimate of the complex ideal ratio mask. We define the complex ideal ratio mask so that direct… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
102
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
5
4
1

Relationship

0
10

Authors

Journals

citations
Cited by 197 publications
(112 citation statements)
references
References 39 publications
0
102
0
Order By: Relevance
“…The loss function used for the proposed method, Eqs. (5) and (6), was also used for the conventional method. DNN in the proposed and conventional methods were trained 300 epochs where each epoch contained 2893 utterances which were randomly selected from the train set, and mini-batch size was 1.…”
Section: Dnn Architecture Loss Function and Training Setupmentioning
confidence: 99%
“…The loss function used for the proposed method, Eqs. (5) and (6), was also used for the conventional method. DNN in the proposed and conventional methods were trained 300 epochs where each epoch contained 2893 utterances which were randomly selected from the train set, and mini-batch size was 1.…”
Section: Dnn Architecture Loss Function and Training Setupmentioning
confidence: 99%
“…They are chosen in this way so that the scale of all the terms is almost the same. The regularization term for the generator is cosine similarity loss instead of L1 as widely used in other GAN methods [4,25]. We add a Gaussian noise with mean 0.0 and variance 0.01 between the encoder and the decoder of the generator.…”
Section: Generative Model (Gan)mentioning
confidence: 99%
“…This approach combines the flexibility of unsupervised NMF-based speech enhancement requiring no prior knowledge of differences between speech and noise characteristics, with online operation allowing for real-time use. RT-GCC-NMF generalizes to unseen speakers, acoustic environments, and recording setups from very little unlabeled training data: on the order of one thousand 64 ms frames, compared to hours of labeled training data required for deep learning approaches [3]. The pre-learned NMF dictionary is also very fast to train, on the order of seconds or minutes, in contrast with hours required to train deep neural networks.…”
Section: Introductionmentioning
confidence: 99%