Interspeech 2016 2016
DOI: 10.21437/interspeech.2016-224
|View full text |Cite
|
Sign up to set email alerts
|

SNR-Based Progressive Learning of Deep Neural Network for Speech Enhancement

Abstract: In this paper, we propose a novel progressive learning (PL) framework for deep neural network (DNN) based speech enhancement. It aims at decomposing the complicated regression problem of mapping noisy to clean speech into a series of subproblems for enhancing system performances and reducing model complexities. As an illustration, we design a signal-tonoise ratio (SNR) based PL architecture by guiding each hidden layer of the DNN to learn an intermediate target with gradual S-NR gains explicitly. Furthermore, … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
39
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
4
3
2

Relationship

1
8

Authors

Journals

citations
Cited by 77 publications
(45 citation statements)
references
References 29 publications
(26 reference statements)
1
39
0
Order By: Relevance
“…However, the generalization ability in mismatched conditions is the main problem of deep learning based method. Inspired by our previous work [21,22], we adopt an advanced LSTM architecture with the novel design of hidden layers via densely connected progressive learning and output layer via multiple-target learning. The overall LSTM architecture aims to predict the clean LPS features given the input noisy log-power spectra (LPS) features with acoustic context.…”
Section: Speech Denoisingmentioning
confidence: 99%
See 1 more Smart Citation
“…However, the generalization ability in mismatched conditions is the main problem of deep learning based method. Inspired by our previous work [21,22], we adopt an advanced LSTM architecture with the novel design of hidden layers via densely connected progressive learning and output layer via multiple-target learning. The overall LSTM architecture aims to predict the clean LPS features given the input noisy log-power spectra (LPS) features with acoustic context.…”
Section: Speech Denoisingmentioning
confidence: 99%
“…This stacking style network can learn multiple targets progressively and efficiently. In order to make full use of the rich set of information from the multiple learning targets, we update the progressive learning in [22] with dense structures [23] in which the input and the estimations of intermediate target are spliced together to learn next target. Then, a weighted MMSE criterion in terms of multitask learning (MTL) is designed to optimize all network parameters randomly initialized with K target layers as follows:…”
Section: Speech Denoisingmentioning
confidence: 99%
“…To improve generalization, a common method is to increase the size of the network, such as adding more hidden layers for progressive training [23], using multistage networks [24] or adopting end-to-end approaches. Santiago Pascual et al propose a SEGAN to learn complex functions from large example sets [25].…”
Section: In the 1980s Ephraim And Malah Proposed The Minimummentioning
confidence: 99%
“…In recent years, deep neural network (DNN)-based speech enhancement methods have shown significant performance advantages over the traditional approaches in complex noise environments, even the extremely nonstationary noises. Whether utilizing masking-based [9][10][11] or mapping-based [12][13][14][15][16][17] DNN methods, their general rule is to optimize the loss function between the ideal and noisy targets to achieve as little error as possible in the global noisy speech dataset. Consequently, richer datasets, and a better objective function and neural network models were further explored to guarantee the robust generalization ability of DNN models to cope with the diversified noise environments in real life.…”
Section: Introductionmentioning
confidence: 99%