A binaural processor for missing data speech recognition in the presence of noise and small-room reverberation

Palomäki, Kalle; Brown, Guy J.; Wang, DeLiang L.

doi:10.1016/j.specom.2004.03.005

Cited by 88 publications

(90 citation statements)

References 33 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…It is based on the perceptual emphasis of the first wave front and has been implemented to both dereverberate and separate speech signals (Palomaki et al, 2004). We tested and implemented a precedence model developed by (Hummersone et al, 2010).…”

Section: Dereverberation Methodsmentioning

confidence: 99%

Two-Microphone Dereverberation for Automatic Speech Recognition of Polish

Kundegorski¹,

Jackson²,

Ziółko³

2015

Archives of Acoustics

View full text Add to dashboard Cite

Reverberation is a common problem for many speech technologies, such as automatic speech recognition (ASR) systems. This paper investigates the novel combination of precedence, binaural and statistical independence cues for enhancing reverberant speech, prior to ASR, under these adverse acoustical conditions when two microphone signals are available. Results of the enhancement are evaluated in terms of relevant signal measures and accuracy for both English and Polish ASR tasks. These show inconsistencies between the signal and recognition measures, although in recognition the proposed method consistently outperforms all other combinations and the spectral-subtraction baseline.

show abstract

Section: Dereverberation Methodsmentioning

confidence: 99%

Two-Microphone Dereverberation for Automatic Speech Recognition of Polish

Kundegorski¹,

Jackson²,

Ziółko³

2015

Archives of Acoustics

View full text Add to dashboard Cite

show abstract

“…Numerous algorithms have been proposed for developing the values of M [n, k] based on the inputs (e.g. [6,7,8,9,11,12,13]) and other variations are possible in which M [n, k] is a continuous function of the inputs rather than binary. In the algorithms considered, the mask M [n, k] is typically based on the cell-by-cell comparions of the left and right input signals; however, T-F masking is also widely applied to mono audio to improve signal quality for ASR [14,15,16] and for human intelligibility [17,18].…”

Section: Time-frequency Maskingmentioning

confidence: 99%

“…Results of previous studies using these techniques (e.g. [6,7,8,9,10,11,12]) suggest the following observations (among others): While T-F masking techniques are typically well motivated, there has been little formal mathematical analysis of them, with performance typically expressed in terms of secondary statistics such the accuracy of automatic speech recognition (ASR) systems. While it is true that algorithms developed to improve ASR recognition accuracy must be evaluated in terms of ASR performance, we also believe that further mathematical analysis and comparison to linear beamforming is potentially beneficial, as speech recognition experiments tend to This work has been supported by the National Science Foundation (Grant IIS-I0916918) and the Cisco Corporation (Grant 570877).…”

Section: Introductionmentioning

confidence: 99%

An analysis of binaural spectro-temporal masking as nonlinear beamforming

Moghimi

Stern

2014

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

Array-based time-frequency masking algorithms are an important type of nonlinear array processing. In this paper we develop a model that characterizes the directional sensitivity of these algorithms in a fashion similar to commonly-used the beam patterns used to characterize linear array processing. Two alternative formulations are described, and it is shown that one of these formulations predicts signal distortion and processing gain in time-frequency masking accurately, as well as speech recognition accuracy afforded by time-frequency masking in the presence of additive interfering sources.

show abstract

“…The grouping stage then groups the components that are likely to be from the same source e.g. using information such as simultaneous onset/offset of particular frequency amplitudes or relationships of particular frequencies to source pitch [45][46][47][48][49][50]. It is well-known that the ICA technique is not effective in separating the underdetermined mixtures, for which, as mentioned above, one has to turn to, e.g.…”

Section: Introductionmentioning

confidence: 99%