2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2016
DOI: 10.1109/icassp.2016.7471725
|View full text |Cite
|
Sign up to set email alerts
|

Combining non-negative matrix factorization and deep neural networks for speech enhancement and automatic speech recognition

Abstract: Sparse Non-negative Matrix Factorization (SNMF) and Deep Neural Networks (DNN) have emerged individually as two efficient machine learning techniques for single-channel speech enhancement. Nevertheless, there are only few works investigating the combination of SNMF and DNN for speech enhancement and robust Automatic Speech Recognition (ASR). In this paper, we present a novel combination of speech enhancement components based-on SNMF and DNN into a full-stack system. We refine the cost function of the DNN to ba… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
23
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 27 publications
(25 citation statements)
references
References 13 publications
(26 reference statements)
0
23
0
Order By: Relevance
“…In the next sections, we apply these three steps to different cost functions c ji . In each case, we specify the probability density function (pdf) involved in (2). We then choose the cost functions c ji with respect to the possible decisions and hypotheses.…”
Section: Joint Detection and Estimation Approach: General Frameworkmentioning
confidence: 99%
“…In the next sections, we apply these three steps to different cost functions c ji . In each case, we specify the probability density function (pdf) involved in (2). We then choose the cost functions c ji with respect to the possible decisions and hypotheses.…”
Section: Joint Detection and Estimation Approach: General Frameworkmentioning
confidence: 99%
“…The adaptive masking coefficient (a) is derived by the signal-to-noise ratio, which is used to weight IBM and IRM to get the adaptive mask as the training target for DNN through Eq. (16).…”
Section: Deep Neural Network Modelmentioning
confidence: 99%
“…In the same time, Huang et al put forward a joint optimization of masks and deep recurrent neural networks (DRNN) for Monaural Source Separation algorithm [15]. In 2016, Vu et al also presented a speech enhancement algorithm combining non-negative matrix factorization and deep neural networks [16]. These algorithms mentioned above are all to estimate the amplitude spectrum of the target speech.…”
Section: Introductionmentioning
confidence: 99%
“…The ambient noise can affect the signal components of speech and worsen the representation of recognition result. To solve the problem of noise, many methods of mitigating the effect of noise on ASR development, have been developed [ 8 , 9 , 10 , 11 , 12 , 13 , 14 , 15 , 16 , 17 , 18 ].…”
Section: Introductionmentioning
confidence: 99%
“…To make the ASR system more robust in noisy environments, the methodology of artificial neural networks (ANN), especially deep neural networks (DNN), has been widely utilized in speech enhancement for ASR in recent years [ 16 , 17 , 18 ]. The goal of DNN is to implement complex nonlinear numeric functions, which are used to directly map log-likelihood spectral features of noisy speech into corresponding clean speech.…”
Section: Introductionmentioning
confidence: 99%