Interspeech 2016 2016
DOI: 10.21437/interspeech.2016-88
|View full text |Cite
|
Sign up to set email alerts
|

Speech Enhancement in Multiple-Noise Conditions Using Deep Neural Networks

Abstract: In this paper we consider the problem of speech enhancement in real-world like conditions where multiple noises can simultaneously corrupt speech. Most of the current literature on speech enhancement focus primarily on presence of single noise in corrupted speech which is far from real-world environments. Specifically, we deal with improving speech quality in office environment where multiple stationary as well as non-stationary noises can be simultaneously present in speech. We propose several strategies base… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
72
0

Year Published

2018
2018
2020
2020

Publication Types

Select...
5
4
1

Relationship

0
10

Authors

Journals

citations
Cited by 99 publications
(72 citation statements)
references
References 25 publications
0
72
0
Order By: Relevance
“…The latter seems a promising way to improve objective quality, although both models have to incorporate the standard MSE due to the band limitation of each objective measure. [21] reported that a simple perceptually weighted wide-band MSE alone does not improve objective speech quality or intelligibility, suggesting that the MSE is still a reliable learning objective for wide-band speech enhancement.…”
Section: Introductionmentioning
confidence: 99%
“…The latter seems a promising way to improve objective quality, although both models have to incorporate the standard MSE due to the band limitation of each objective measure. [21] reported that a simple perceptually weighted wide-band MSE alone does not improve objective speech quality or intelligibility, suggesting that the MSE is still a reliable learning objective for wide-band speech enhancement.…”
Section: Introductionmentioning
confidence: 99%
“…We down-sample the input audio from 48 kHz to 16 kHz. And during training, similar to [4], we divided the original audio into overlapped slices with the stride 2 13 , each of which has 2 14 samples (approximately 1 second). During testing, as in [4], we divide the test utterance into non-overlapped slices and concatenate the results as the final enhanced speech for the whole duration.…”
Section: Discriminative Modelmentioning
confidence: 99%
“…Advances in deep learning recently improved the performance of noise reduction algorithms [1,2,3]. It is common practice to transform the noisy time-domain signal into a time-frequency representation, for instance using a short-time Fourier transform (STFT).…”
Section: Introductionmentioning
confidence: 99%