2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2015
DOI: 10.1109/icassp.2015.7178798
|View full text |Cite
|
Sign up to set email alerts
|

Spatial diffuseness features for DNN-based speech recognition in noisy and reverberant environments

Abstract: We propose a spatial diffuseness feature for deep neural network (DNN)-based automatic speech recognition to improve recognition accuracy in reverberant and noisy environments. The feature is computed in real-time from multiple microphone signals without requiring knowledge or estimation of the direction of arrival, and represents the relative amount of diffuse noise in each time and frequency bin. It is shown that using the diffuseness feature as an additional input to a DNN-based acoustic model leads to a re… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
22
0

Year Published

2015
2015
2024
2024

Publication Types

Select...
6
2
2

Relationship

2
8

Authors

Journals

citations
Cited by 29 publications
(23 citation statements)
references
References 19 publications
(34 reference statements)
0
22
0
Order By: Relevance
“…Also, results have since been generalized from omnidirectional microphones to other microphone directivities [12], [13] and spherical microphone arrays [14]. While these estimates can be used for the formulation of postfilters for signal enhancement [15], which is the main application considered in this contribution, short-time CDR estimates (or the equivalent "diffuseness" measure) also have applications in parametric coding of spatial audio signals [16] and the extraction of spatial features for automatic speech recognition (ASR) [17].…”
Section: Introductionmentioning
confidence: 98%
“…Also, results have since been generalized from omnidirectional microphones to other microphone directivities [12], [13] and spherical microphone arrays [14]. While these estimates can be used for the formulation of postfilters for signal enhancement [15], which is the main application considered in this contribution, short-time CDR estimates (or the equivalent "diffuseness" measure) also have applications in parametric coding of spatial audio signals [16] and the extraction of spatial features for automatic speech recognition (ASR) [17].…”
Section: Introductionmentioning
confidence: 98%
“…a mask [8][9][10]. In the multichannel case, several approaches have been proposed to pass spatial information directly to a DNN, for instance using phase difference features between non-coincident microphones [11] or coherence features [12]. However, in these two studies, the mask estimated by the DNN is still applied as a single-channel filter only.…”
Section: Introductionmentioning
confidence: 99%
“…Although considerable progresses were made at multi-microphone frontend processing level in order to feed ASR with an enhanced speech input [4,5,6,7,8], the performance loss observed from close-talking to distant-speech remains quite critical, even when the most advanced DNN-based backend frameworks are adopted [9,10,11,12,13,14].…”
Section: Introductionmentioning
confidence: 99%