ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019
DOI: 10.1109/icassp.2019.8683861
|View full text |Cite
|
Sign up to set email alerts
|

Data-driven Design of Perfect Reconstruction Filterbank for DNN-based Sound Source Enhancement

Abstract: We propose a data-driven design method of perfect-reconstruction filterbank (PRFB) for sound-source enhancement (SSE) based on deep neural network (DNN). DNNs have been used to estimate a time-frequency (T-F) mask in the short-time Fourier transform (STFT) domain. Their training is more stable when a simple cost function as mean-squared error (MSE) is utilized comparing to some advanced cost such as objective sound quality assessments. However, such a simple cost function inherits strong assumptions on the sta… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
12
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
5
1
1

Relationship

5
2

Authors

Journals

citations
Cited by 17 publications
(12 citation statements)
references
References 28 publications
0
12
0
Order By: Relevance
“…Their collaboration with spectrogram consistency was also mentioned shortly. Some research has suggested that time-frequency representation is a worthwhile topic for discussion also in deep-learning-based methods [47,48], and we hope that this paper (together with the supplemental MATLAB code) will be helpful for developing audio 12 Equation numbers indicate the definitions of DGTs. Those with tilde were calculated with the same but zero-phased window (index was rotated before FFT so that peak of the window becomes the first element, see the supplemental code).…”
Section: Discussionmentioning
confidence: 99%
“…Their collaboration with spectrogram consistency was also mentioned shortly. Some research has suggested that time-frequency representation is a worthwhile topic for discussion also in deep-learning-based methods [47,48], and we hope that this paper (together with the supplemental MATLAB code) will be helpful for developing audio 12 Equation numbers indicate the definitions of DGTs. Those with tilde were calculated with the same but zero-phased window (index was rotated before FFT so that peak of the window becomes the first element, see the supplemental code).…”
Section: Discussionmentioning
confidence: 99%
“…An important requirement in DNN-based speech enhancement and separation is generalization that means working for any speaker. To achieve this, in speech enhancement, several studies train a global M using many speech samples spoken by many speakers [3][4][5][6][7][8][9][10][11][12][13][14]. Unfortunately, in speech separation, generalization cannot be achieved solely using a large scale training dataset because there is no way of knowing which signal in the speech-mixture is the target.…”
Section: Auxiliary Speaker-aware Feature For Speech Separationmentioning
confidence: 99%
“…Recently, speech enhancement is advanced by the use of a deep neural network (DNN) to estimate a T-F mask. For effectively modelling a speech signal which is timesequential data, a recurrent neural network (RNN) is used in various speech signal processing applications [1][2][3][4][5][6][7][8][9][10][11][12][13][14].…”
Section: Introductionmentioning
confidence: 99%
“…By combining three gated units (input gate, forget gate and output gate), LSTM solves the vanishing gradient problem to some extent. As it can be trained effectively in practice, LSTM and the bidirectional LSTM (BLSTM) has been applied to speech enhancement and performed better than the conventional methods at the time [2,4,[8][9][10][11][12][13][14].…”
Section: Introductionmentioning
confidence: 99%