2017 25th European Signal Processing Conference (EUSIPCO) 2017
DOI: 10.23919/eusipco.2017.8081412
|View full text |Cite
|
Sign up to set email alerts
|

A perceptually-weighted deep neural network for monaural speech enhancement in various background noise conditions

Abstract: Abstract-Deep neural networks (DNN) have recently been shown to give state-of-the-art performance in monaural speech enhancement. However in the DNN training process, the perceptual difference between different components of the DNN output is not fully exploited, where equal importance is often assumed. To address this limitation, we have proposed a new perceptually-weighted objective function within a feedforward DNN framework, aiming to minimize the perceptual difference between the enhanced speech and the t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4

Citation Types

0
29
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
6
3

Relationship

2
7

Authors

Journals

citations
Cited by 30 publications
(30 citation statements)
references
References 14 publications
0
29
0
Order By: Relevance
“…Furthermore, in [18], the estimation of a metric score can be obtained by the discriminator of a generative adversarial network, and the generator then uses the prediction to decide for a gradient direction for optimization. Also, some perceptual weighting rules are proposed in loss functions for neural network training to achieve a better perceptual quality, e.g., high-energy frequency areas are emphasized in the loss functions in [19,20]. However, a perceptual model is not considered in [19] and the loss function proposed in [20] contains an empirical function to balance boosting high energy components and suppressing low energy components.…”
mentioning
confidence: 99%
“…Furthermore, in [18], the estimation of a metric score can be obtained by the discriminator of a generative adversarial network, and the generator then uses the prediction to decide for a gradient direction for optimization. Also, some perceptual weighting rules are proposed in loss functions for neural network training to achieve a better perceptual quality, e.g., high-energy frequency areas are emphasized in the loss functions in [19,20]. However, a perceptual model is not considered in [19] and the loss function proposed in [20] contains an empirical function to balance boosting high energy components and suppressing low energy components.…”
mentioning
confidence: 99%
“…Second, global MSE optimization usually obtains an oversmoothing estimation which omits some important detailed information. To solve these problems, many new criteria, that consider speech perception, have been proposed in most recent years [9][10][11][12]. The first one is to use perceptually weighted MSE functions, which are proposed to weight the loss in different time-frequency (T-F) regions [10,13].…”
Section: Introductionmentioning
confidence: 99%
“…To solve these problems, many new criteria, that consider speech perception, have been proposed in most recent years [9][10][11][12]. The first one is to use perceptually weighted MSE functions, which are proposed to weight the loss in different time-frequency (T-F) regions [10,13]. The second one is to use objective metrics as loss functions, for examples, perceptual evaluation speech quality (PESQ) [14], short-time objective intelligibility (STOI) [15] and scale-invariant speech distortion ratio (SI-SDR) [16] have been adopted as loss functions.…”
Section: Introductionmentioning
confidence: 99%
“…Sound-source enhancement (SSE) is used to recover the target sound from a noisy observed signal. A recent advancement of SSE is the use of a deep neural network (DNN) to estimate a time-frequency (T-F) mask in the short-time Fourier transform (STFT) domain [1][2][3][4][5][6][7]. In early studies, the mean-squared error (MSE) is used as the cost function to train the parameters of DNN [1,8,9] because a gradient of MSE with respect to the parameters can be calculated analytically.…”
Section: Introductionmentioning
confidence: 99%