2014
DOI: 10.1109/taslp.2014.2318514
|View full text |Cite
|
Sign up to set email alerts
|

Memory-Enhanced Neural Networks and NMF for Robust ASR

Abstract: In this article we address the problem of distant speech recognition for reverberant noisy environments. Speech enhancement methods, e. g., using non-negative matrix factorization (NMF), are succesful in improving the robustness of ASR systems. Furthermore, discriminative training and feature transformations are employed to increase the robustness of traditional systems using Gaussian mixture models (GMM). On the other hand, acoustic models based on deep neural networks (DNN) were recently shown to outperform … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
13
0

Year Published

2014
2014
2022
2022

Publication Types

Select...
7
2
1

Relationship

1
9

Authors

Journals

citations
Cited by 27 publications
(13 citation statements)
references
References 37 publications
(64 reference statements)
0
13
0
Order By: Relevance
“…It is likely that RNNs and discriminative training strategies can further improve performance of the proposed framework. The system in [9] proposes model combination–it uses an NMF feature enhancement frontend and discriminatively trained GMM-AMs, and combines it with a long short-term memory based acoustic model. It obtains an average WER of 20.0, 4.6 percentage points worse than the proposed system.…”
Section: Discussionmentioning
confidence: 99%
“…It is likely that RNNs and discriminative training strategies can further improve performance of the proposed framework. The system in [9] proposes model combination–it uses an NMF feature enhancement frontend and discriminatively trained GMM-AMs, and combines it with a long short-term memory based acoustic model. It obtains an average WER of 20.0, 4.6 percentage points worse than the proposed system.…”
Section: Discussionmentioning
confidence: 99%
“…In [29], the recurrent architecture is introduced into the DNN-HMM hybrid system and the authors can achieve state-of-the-art performances on both the 2nd CHiME challenge (track 2) [30] and Aurora-4 tasks without front-end preprocessing, speaker adaptive training or multiple decoding passes. Recently long short-term memory (LSTM) recurrent architectures are also introduced in the hybrid system [31], [32] to further improve the robustness of the acoustic models.…”
Section: Dnn-based Approaches To Noise Robust Asrmentioning
confidence: 99%
“…STFT coefficients of the array signals are used as the input of the beamforming network. In ASR systems of [12,13], the speech signal is enhanced by NMF and LSTM before fed into the acoustic model. But speech enhancement module and the acoustic model are not jointly optimized to minimize the WER and the input is only single channel signal.…”
Section: Introductionmentioning
confidence: 99%