Processing of reverberant speech for time-delay estimation

Yegnanarayana, B.; Prasanna, S. R. Mahadeva; Duraiswami, Ramani; Zotkin, Dmitry N.

doi:10.1109/tsa.2005.853005

Cited by 70 publications

(38 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…The epochs can be used as pitch markers for prosody manipulation, which is useful in applications like text-to-speech synthesis, voice conversion and speech rate conversion [3], [4]. Knowledge of the epoch locations may be used for estimating the time-delay between speech signals collected over a pair of spatially distributed microphones [5]. The segmental signal-to-noise ratio (SNR) of the speech signal is high in the regions around epochs, and hence, it is possible to enhance the speech by exploiting the characteristics of speech signals around the epochs [6].…”

Section: A Significance Of Epochs In Speech Analysismentioning

confidence: 99%

Epoch Extraction From Speech Signals

Murty

Yegnanarayana

2008

IEEE Trans. Audio Speech Lang. Process.

Self Cite

538

288

View full text Add to dashboard Cite

Abstract-Epoch is the instant of significant excitation of the vocal-tract system during production of speech. For most voiced speech, the most significant excitation takes place around the instant of glottal closure. Extraction of epochs from speech is a challenging task due to time-varying characteristics of the source and the system. Most epoch extraction methods attempt to remove the characteristics of the vocal-tract system, in order to emphasize the excitation characteristics in the residual. The performance of such methods depends critically on our ability to model the system. In this paper, we propose a method for epoch extraction which does not depend critically on characteristics of the time-varying vocaltract system. The method exploits the nature of impulse-like excitation. The proposed zero resonance frequency filter output brings out the epoch locations with high accuracy and reliability. The performance of the method is demonstrated using CMU-Arctic database using the epoch information from the electro-glottograph as reference. The proposed method performs significantly better than the other methods currently available for epoch extraction. The interesting part of the results is that the epoch extraction by the proposed method seems to be robust against degradations like white noise, babble, high-frequency channel, and vehicle noise.

show abstract

Section: A Significance Of Epochs In Speech Analysismentioning

confidence: 99%

Epoch Extraction From Speech Signals

Murty

Yegnanarayana

2008

IEEE Trans. Audio Speech Lang. Process.

Self Cite

538

288

View full text Add to dashboard Cite

show abstract

“…The step-sizes have been chosen such that all algorithms reach same asymptotic NPM. As before, true delays for direct-paths have been employed for ext-NMCFLMS [8] while we have employed GCC with PHAT prefilter of the Hilbert envelope of LP residual of speech [14] to estimate TDOA of direct-paths for ext-NMCFLMSDPE. After initial convergence, NMCFLMS and ext-NMCFLMS misconverge whereas ext-NMCFLMSDPE avoids misconvergence.…”

Section: Simulation Resultsmentioning

confidence: 99%

“…For reverberant speech, an effective method has been proposed in [14] which performs GCC on the Hilbert envelope of linear prediction (LP) residual of input speech.…”

Section: The Gcc Algorithmmentioning

confidence: 99%

A Practical Adaptive Blind Multichannel Estimation Algorithm with Application to Acoustic Impulse Responses

Ahmad

Khong

Naylor

2007

2007 15th International Conference on Digital Signal Processing

View full text Add to dashboard Cite

We propose a noise robust adaptive blind multichannel identification algorithm for acoustic impulse responses. It has been known that the normalized multichannel frequency domain least-meansquare (NMCFLMS) algorithm misconverges under low signal-tonoise ratio. The coefficients of NMCFLMS converge initially toward the true impulse response after which they then misconverge. The extended NMCFLMS (ext-NMCFLMS) algorithm which has been proposed to mitigate this misconvergence problem assumes the knowledge of magnitude and time-differences-of-arrival (TDOA) of the direct paths for the acoustic impulse responses. In this work, we show how the TDOA can be obtained. More importantly, we present a novel approach to estimate the magnitude of the direct path component under practical conditions. We then show how these estimates can be incorporated to the proposed ext-NMCFLMS with direct path estimation algorithm. We analyze how errors in these estimates affect the performance of the proposed algorithm.

show abstract

“…A class of temporal processing methods have been proposed by exploiting the excitation source characteristics of the speech signal for the enhancement (Yegnanarayana et al 1999;Yegnanarayana & Satyanarayana Murthy 2000;Yegnanarayana et al 2003Yegnanarayana et al , 2005. Linear prediction (LP) residual obtained by inverse filtering the speech is used as an estimate of the source of excitation of the vocal tract system (Yegnanarayana et al 1999).…”

Section: Motivation For the Combined Temporal And Spectral Processingmentioning

confidence: 99%

Application of combined temporal and spectral processing methods for speaker recognition under noisy, reverberant or multi-speaker environments

Krishnamoorthy

Prasanna

2009

Sadhana

View full text Add to dashboard Cite

This paper presents an experimental evaluation of the combined temporal and spectral processing methods for speaker recognition task under noise, reverberation or multi-speaker environments. Automatic speaker recognition system gives good performance in controlled environments. Speech recorded in real environments by distant microphones is degraded by factors like background noise, reverberation and interfering speakers. This degradation strongly affects the performance of the speaker recognition system. Combined temporal and spectral processing (TSP) methods proposed in our earlier study are used for pre-processing to improve the speaker-specific features and hence the speaker recognition performance. Different types of degradation like background noise, reverberation and interfering speaker are considered for evaluation. The evaluation is carried out for the individual temporal processing, spectral processing and the combined TSP method. The experimental results show that the combined TSP methods give relatively higher recognition performance compared to either temporal or spectral processing alone.

show abstract

Processing of reverberant speech for time-delay estimation

Cited by 70 publications

References 18 publications

Epoch Extraction From Speech Signals

Epoch Extraction From Speech Signals

A Practical Adaptive Blind Multichannel Estimation Algorithm with Application to Acoustic Impulse Responses

Application of combined temporal and spectral processing methods for speaker recognition under noisy, reverberant or multi-speaker environments

Contact Info

Product

Resources

About