ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2020
DOI: 10.1109/icassp40776.2020.9054578
|View full text |Cite
|
Sign up to set email alerts
|

Stable Training of Dnn for Speech Enhancement Based on Perceptually-Motivated Black-Box Cost Function

Abstract: Improving subjective sound quality of enhanced signals is one of the most important missions in speech enhancement. For evaluating the subjective quality, several methods related to perceptuallymotivated objective sound quality assessment (OSQA) have been proposed such as PESQ (perceptual evaluation of speech quality). However, direct use of such measures for training deep neural network (DNN) is not allowed in most cases because popular OSQAs are non-differentiable with respect to DNN parameters. Therefore, t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
16
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
2
2

Relationship

2
7

Authors

Journals

citations
Cited by 27 publications
(16 citation statements)
references
References 18 publications
0
16
0
Order By: Relevance
“…They are used as the high level abstraction to measure the training loss between reconstructed signals and reference signals. Such training loss is also called deep feature loss [71], [72].…”
Section: Perceptual Loss For Style Reconstructionmentioning
confidence: 99%
“…They are used as the high level abstraction to measure the training loss between reconstructed signals and reference signals. Such training loss is also called deep feature loss [71], [72].…”
Section: Perceptual Loss For Style Reconstructionmentioning
confidence: 99%
“…for hearing aids [5]- [9]. PESQ has also been proposed as loss function for supervised learning [10], [11].…”
Section: Introductionmentioning
confidence: 99%
“…Inspired by progresses in black-box function optimization [26], [27], we previously proposed a generative adversarial network (GAN)-based system [28] for near-end intelligibility enhancement. The system was composed of a generator that enhances the intelligibility of input speech and a discriminator that acts as a learned surrogate of evaluation metrics to guide the training scheme of the generator.…”
Section: Introductionmentioning
confidence: 99%