Spatial diffuseness features for DNN-based speech recognition in noisy and reverberant environments

Schwarz, Andreas; Huemmer, Christian; Maas, Roland; Kellermann, Walter

doi:10.1109/icassp.2015.7178798

Cited by 29 publications

(23 citation statements)

References 19 publications

(34 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Also, results have since been generalized from omnidirectional microphones to other microphone directivities [12], [13] and spherical microphone arrays [14]. While these estimates can be used for the formulation of postfilters for signal enhancement [15], which is the main application considered in this contribution, short-time CDR estimates (or the equivalent "diffuseness" measure) also have applications in parametric coding of spatial audio signals [16] and the extraction of spatial features for automatic speech recognition (ASR) [17].…”

Section: Introductionmentioning

confidence: 98%

Coherent-to-Diffuse Power Ratio Estimation for Dereverberation

Schwarz

Kellermann

2015

IEEE/ACM Trans. Audio Speech Lang. Process.

Self Cite

View full text Add to dashboard Cite

The estimation of the time- and frequency-dependent coherent-to-diffuse power ratio (CDR) from the measured spatial coherence between two omnidirectional microphones is investigated. Known CDR estimators are formulated in a common framework, illustrated using a geometric interpretation in the complex plane, and investigated with respect to bias and robustness towards model errors. Several novel unbiased CDR estimators are proposed, and it is shown that knowledge of either the direction of arrival (DOA) of the target source or the coherence of the noise field is sufficient for unbiased CDR estimation. The validity of the model for the application of CDR estimates to dereverberation is investigated using measured and simulated impulse responses. A CDR-based dereverberation system is presented and evaluated using signal-based quality measures as well as automatic speech recognition accuracy. The results show that the proposed unbiased estimators have a practical advantage over existing estimators, and that the proposed DOA-independent estimator can be used for effective blind dereverberation.Comment: submitted to IEEE/ACM Transactions on Audio, Speech, and Language Processing, 201

show abstract

Section: Introductionmentioning

confidence: 98%

Coherent-to-Diffuse Power Ratio Estimation for Dereverberation

Schwarz

Kellermann

2015

IEEE/ACM Trans. Audio Speech Lang. Process.

Self Cite

View full text Add to dashboard Cite

show abstract

“…a mask [8][9][10]. In the multichannel case, several approaches have been proposed to pass spatial information directly to a DNN, for instance using phase difference features between non-coincident microphones [11] or coherence features [12]. However, in these two studies, the mask estimated by the DNN is still applied as a single-channel filter only.…”

Section: Introductionmentioning

confidence: 99%

Multichannel Speech Separation with Recurrent Neural Networks from High-Order Ambisonics Recordings

Perotin

Serizel

Guérin

2018

2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

“…Although considerable progresses were made at multi-microphone frontend processing level in order to feed ASR with an enhanced speech input [4,5,6,7,8], the performance loss observed from close-talking to distant-speech remains quite critical, even when the most advanced DNN-based backend frameworks are adopted [9,10,11,12,13,14].…”

Section: Introductionmentioning

confidence: 99%

The DIRHA-ENGLISH corpus and related tasks for distant-speech recognition in domestic environments

Ravanelli

Cristoforetti

Gretter

et al. 2015

2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU)

View full text Add to dashboard Cite

This paper introduces the contents and the possible usage of the DIRHA-ENGLISH multi-microphone corpus, recently realized under the EC DIRHA project. The reference scenario is a domestic environment equipped with a large number of microphones and microphone arrays distributed in space.The corpus is composed of both real and simulated material, and it includes 12 US and 12 UK English native speakers. Each speaker uttered different sets of phonetically-rich sentences, newspaper articles, conversational speech, keywords, and commands. From this material, a large set of 1-minute sequences was generated, which also includes typical domestic background noise as well as inter/intra-room reverberation effects. Dev and test sets were derived, which represent a very precious material for different studies on multi-microphone speech processing and distant-speech recognition. Various tasks and corresponding Kaldi recipes have already been developed.The paper reports a first set of baseline results obtained using different techniques, including Deep Neural Networks (DNN), aligned with the state-of-the-art at international level.

show abstract

Spatial diffuseness features for DNN-based speech recognition in noisy and reverberant environments

Cited by 29 publications

References 19 publications

Coherent-to-Diffuse Power Ratio Estimation for Dereverberation

Coherent-to-Diffuse Power Ratio Estimation for Dereverberation

Multichannel Speech Separation with Recurrent Neural Networks from High-Order Ambisonics Recordings

The DIRHA-ENGLISH corpus and related tasks for distant-speech recognition in domestic environments

Contact Info

Product

Resources

About