Single-channel dereverberation by feature mapping using cascade neural networks for robust distant speaker identification and speech recognition

Nugraha, Aditya Arie; Yamamoto, Kazumasa; Nakagawa, Seiichi

doi:10.1186/1687-4722-2014-13

Cited by 11 publications

(6 citation statements)

References 38 publications

(52 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In choosing the context frames, we use every second frame relative to the center frame in order to reduce the redundancies caused by the windowing of STFT. Although this causes some information loss, this enables the supervectors to represent a longer context [16], [48]. In addition, we do not use the magnitude spectra of the context frames directly, but the difference of magnitude between the context frames and the center frame.…”

Section: Dnn Spectral Modelsmentioning

confidence: 99%

Multichannel Audio Source Separation With Deep Neural Networks

Nugraha

Liutkus

Vincent

2016

IEEE/ACM Trans. Audio Speech Lang. Process.

Self Cite

247

194

View full text Add to dashboard Cite

This article addresses the problem of multichannel audio source separation. We propose a framework where deep neural networks (DNNs) are used to model the source spectra and combined with the classical multichannel Gaussian model to exploit the spatial information. The parameters are estimated in an iterative expectation-maximization (EM) fashion and used to derive a multichannel Wiener filter. We present an extensive experimental study to show the impact of different design choices on the performance of the proposed technique. We consider different cost functions for the training of DNNs, namely the probabilistically motivated Itakura-Saito divergence, and also Kullback-Leibler, Cauchy, mean squared error, and phase-sensitive cost functions. We also study the number of EM iterations and the use of multiple DNNs, where each DNN aims to improve the spectra estimated by the preceding EM iteration. Finally, we present its application to a speech enhancement problem. The experimental results show the benefit of the proposed multichannel approach over a single-channel DNNbased approach and the conventional multichannel nonnegative matrix factorization based iterative EM algorithm.

show abstract

Section: Dnn Spectral Modelsmentioning

confidence: 99%

Multichannel Audio Source Separation With Deep Neural Networks

Nugraha

Liutkus

Vincent

2016

IEEE/ACM Trans. Audio Speech Lang. Process.

Self Cite

247

194

View full text Add to dashboard Cite

show abstract

“…In choosing the context frames, we use every second frame relative to the center frame in order to reduce the redundancies caused by the windowing of STFT. Although this causes some information loss, this enables the supervectors to represent a longer context [28,29]. In addition, we do not use the feature values of context frames directly, but the difference between the values of the context frames and the center frame.…”

Section: Dnn Input and Outputmentioning

confidence: 99%

Robust ASR using neural network based speech enhancement and feature simulation

Sivasankaran

Nugraha

Morales-Cordovilla

et al. 2015

2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU)

Self Cite

View full text Add to dashboard Cite

We consider the problem of robust automatic speech recognition (ASR) in the context of the CHiME-3 Challenge. The proposed system combines three contributions. First, we propose a deep neural network (DNN) based multichannel speech enhancement technique, where the speech and noise spectra are estimated using a DNN based regressor and the spatial parameters are derived in an expectation-maximization (EM) like fashion. Second, a conditional restricted Boltzmann machine (CRBM) model is trained using the obtained enhanced speech and used to generate simulated training and development datasets. The goal is to increase the similarity between simulated and real data, so as to increase the benefit of multicondition training. Finally, we make some changes to the ASR backend. Our system ranked 4th among 25 entries.

show abstract

“…Recent work has shown that deep neural networks can be very effective for channel compensation in speech recognition algorithms [7,8,9]. The type of channel compensation DNNs are used for falls into three basic categories: waveform compensation [10,8], feature compensation [11,12,13,8,9] and multicondition classification [14,15,8,9]. The first two categories are very similar in that they use a DNN regression to reconstruct some possibly intermediate feature representation from a clean channel using some possibly different feature representation of the same data from a noisy channel.…”

Section: Introductionmentioning

confidence: 99%

Channel Compensation for Speaker Recognition using MAP Adapted PLDA and Denoising DNNs

Richardson¹,

Nemsick²,

Reynolds³

2016

The Speaker and Language Recognition Workshop (Odyssey 2016)

View full text Add to dashboard Cite

Over several decades, speaker recognition performance has steadily improved for applications using telephone speech. A big part of this improvement has been the availability of large quantities of speaker-labeled data from telephone recordings. For new data applications, such as audio from room microphones, we would like to effectively use existing telephone data to build systems with high accuracy while maintaining good performance on existing telephone tasks. In this paper we compare and combine approaches to compensate models parameters and features for this purpose. For model adaptation we explore MAP adaptation of hyper-parameters and for feature compensation we examine the use of denoising DNNs. On a multi-room, multi-microphone speaker recognition experiment we show a reduction of 61% in EER with a combination of these approaches while slightly improving performance on telephone data.

show abstract

Single-channel dereverberation by feature mapping using cascade neural networks for robust distant speaker identification and speech recognition

Cited by 11 publications

References 38 publications

Multichannel Audio Source Separation With Deep Neural Networks

Multichannel Audio Source Separation With Deep Neural Networks

Robust ASR using neural network based speech enhancement and feature simulation

Channel Compensation for Speaker Recognition using MAP Adapted PLDA and Denoising DNNs

Contact Info

Product

Resources

About