2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) 2019
DOI: 10.1109/waspaa.2019.8937189
|View full text |Cite
|
Sign up to set email alerts
|

A Perceptual Weighting Filter Loss for DNN Training In Speech Enhancement

Abstract: Single-channel speech enhancement with deep neural networks (DNNs) has shown promising performance and is thus intensively being studied. In this paper, instead of applying the mean squared error (MSE) as the loss function during DNN training for speech enhancement, we design a perceptual weighting filter loss motivated by the weighting filter as it is employed in analysis-by-synthesis speech coding, e.g., in code-excited linear prediction (CELP). The experimental results show that the proposed simple loss fun… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
23
1
1

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
1

Relationship

1
5

Authors

Journals

citations
Cited by 21 publications
(26 citation statements)
references
References 24 publications
1
23
1
1
Order By: Relevance
“…As can be observed, all frequency bins have equal importance without any perceptual considerations, such as the masking property of the human ear [29], or the loudness difference [31]. Furthermore, as the MSE loss is optimized in a global fashion, the network may learn to completely attenuate some regions of the noisy spectrum, where the noise component is significantly higher compared to the speech component.…”
Section: Baseline Losses 231 Baseline Msementioning
confidence: 99%
See 2 more Smart Citations
“…As can be observed, all frequency bins have equal importance without any perceptual considerations, such as the masking property of the human ear [29], or the loudness difference [31]. Furthermore, as the MSE loss is optimized in a global fashion, the network may learn to completely attenuate some regions of the noisy spectrum, where the noise component is significantly higher compared to the speech component.…”
Section: Baseline Losses 231 Baseline Msementioning
confidence: 99%
“…The parameters of the deep learning architectures are then optimized by minimizing the MSE between the inferred results and their corresponding targets. In reality, optimization of the MSE loss in training does not guarantee any perceptual quality of the speech component and of the residual noise component, respectively, which leads to limited performance [27][28][29][30][31][32][33][34][35][36]. This effect is even more evident when the level of the noise component is significantly higher than that of the speech component in some regions of the noisy speech spectrum, which explains the bad performance at lower SNR conditions when training with MSE.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…To do so, we computed cost functions on outputs at three stages of the training. The first loss is the perceptually weighted error of the log-magnitude spectra, i.e., = ∑ ∑ ‖( ( , ) −̂( , )) + log(| ( , )| + 10 −8 )‖ 1 (1) where ̂( , ) is the 65-dimensional log-magnitude spectra output from DNN1, ( , ) is the log-magnitude spectra of the clean speech, and ( , ) is the perceptual weight function widely adopted in speech codecs [20]. The second loss counts the estimation error of envelopes, i.e.,…”
Section: Neural Network-based Ace (Nnace)mentioning
confidence: 99%
“…A masking‐based weighting is used to mask noise at frequencies where speech energy is dominant [28]. Perceptual weighting filter loss that places less emphasis near formant peaks and more emphasis on spectral valleys is shown to have the ability to improve perceptual quality [29, 30]. Some commonly used metrics in SE are also modified and designed as loss functions to improve its performance.…”
Section: Introductionmentioning
confidence: 99%