Joint Position-Pitch Estimation for Multiple Speaker Scenarios

Képesi, Marián; Ottowitz, L.; Habib, Tufail

doi:10.1109/hscma.2008.4538694

Cited by 15 publications

(19 citation statements)

References 7 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Applying such a bank of FIR filters on a block of the observed signal, we get (16) where denotes the Hadamard product, and are the temporal and spatial filter lengths, respectively, is the th coefficient of the th filter in the filterbank, , , is a column vector of ones, and…”

Section: A Optimal Filterbanksmentioning

confidence: 99%

“…It should also be noted that the DOA along with the pitch also are believed to be some of the governing factors that the human auditory system uses for separating sources. This line of reasoning has, quite recently, led to some joint DOA and fundamental frequency estimators, including maximum likelihood based [10], [11], subspace-based [12]- [14], correlation-based [15], [16], and filtering-based [17]- [19] methods. Notably, the problem of joint DOA and fundamental frequency estimation was formalized and thoroughly analyzed in [10], and a maximum likelihood estimator that achieves the highest possible accuracy (under certain conditions) was proposed.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Joint Spatio-Temporal Filtering Methods for DOA and Fundamental Frequency Estimation

Jensen

Christensen

Benesty

et al. 2014

IEEE/ACM Trans. Audio Speech Lang. Process.

View full text Add to dashboard Cite

Abstract-In this paper, spatio-temporal filtering methods are proposed for estimating the direction-of-arrival (DOA) and fundamental frequency of periodic signals, like those produced by the speech production system and many musical instruments using microphone arrays. This topic has quite recently received some attention in the community and is quite promising for several applications. The proposed methods are based on optimal, adaptive filters that leave the desired signal, having a certain DOA and fundamental frequency, undistorted and suppress everything else. The filtering methods simultaneously operate in space and time, whereby it is possible resolve cases that are otherwise problematic for pitch estimators or DOA estimators based on beamforming. Several special cases and improvements are considered, including a method for estimating the covariance matrix based on the recently proposed iterative adaptive approach (IAA). Experiments demonstrate the improved performance of the proposed methods under adverse conditions compared to the state of the art using both synthetic signals and real signals, as well as illustrate the properties of the methods and the filters.Index Terms-2-D filtering, DOA estimation, fundamental frequency estimation, joint estimation, LCMV beamformer, periodogram-based beamformer.

show abstract

Section: A Optimal Filterbanksmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Joint Spatio-Temporal Filtering Methods for DOA and Fundamental Frequency Estimation

Jensen

Christensen

Benesty

et al. 2014

IEEE/ACM Trans. Audio Speech Lang. Process.

View full text Add to dashboard Cite

show abstract

“…We have taken the CPSD-based method proposed in [22] combined with cepstral weighting [23], gammatone-like weighting [24], and a subsequent particle filtering [26] as the core algorithm, and we propose several extensions to improve both accuracy and robustness in this paper. As a first extension, a frequency-domain comb filter is introduced to improve the performance for simultaneously active speakers.…”

Section: Introductionmentioning

confidence: 99%

“…In [22], a joint position and pitch (PoPi) estimation method has been proposed which is based on either cross-correlations or crosspower spectral densities (CPSDs). Several extensions have been proposed using cepstral weighting [23], gammatonelike weighting [24], time-domain GCC-PHAT replacement [25], particle filtering [26], and speaker-dependent subgrouping [27]. In [28], a different method based on a recurrent timing neural network is used for joint DOA and pitch estimation.…”

Section: Introductionmentioning

confidence: 99%

“…Since we are interested in joint DOA and pitch estimation, the main feature of the algorithm is the computation of a two-dimensional (2D) pattern for DOA and pitch. As core algorithm, the CPSD-based method proposed in [22] combined with cepstral weighting [23] and gammatone-like weighting [24] is used. To enable speaker tracking, a subsequent particle filter [26] is also part of the core algorithm.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Joint estimation of pitch and direction of arrival: improving robustness and accuracy for multi-speaker scenarios

Gerlach

Bitzer

Goetze

et al. 2014

J AUDIO SPEECH MUSIC PROC.

View full text Add to dashboard Cite

In many speech communication applications, robust localization and tracking of multiple speakers in noisy and reverberant environments are of major importance. Several algorithms to tackle this problem have been proposed in the last decades. In this paper, we propose several extensions to a recently presented joint direction of arrival (DOA) and pitch estimation method, increasing its robustness in multi-speaker scenarios, noise, and reverberation. First, a spectral comb filter is added to the original algorithm to better cope with concurrent speakers. Second, the well-known generalized cross-correlation with phase transform (GCC-PHAT) is used as an additional weighting function to improve the DOA estimation accuracy in terms of correct hits. Third, using multiple microphone pairs, the multi-channel cross-correlation approach is incorporated to improve the robustness against noise and reverberation. In order to improve tracking for moving and even intersecting speakers, a particle filter is used. Experiments with real-world recordings in realistic acoustic conditions show that the proposed extensions increase the DOA hit rate by about 33% compared to the original algorithm for two step-wise moving sources at a signal-to-noise ratio (SNR) of 15 dB and a reverberation time RT 60 of 560 ms.

show abstract

Auditory inspired methods for localization of multiple concurrent speakers

Habib

Romsdorfer

2013

Computer Speech & Language

View full text Add to dashboard Cite

Joint Position-Pitch Estimation for Multiple Speaker Scenarios

Cited by 15 publications

References 7 publications

Joint Spatio-Temporal Filtering Methods for DOA and Fundamental Frequency Estimation

Joint Spatio-Temporal Filtering Methods for DOA and Fundamental Frequency Estimation

Joint estimation of pitch and direction of arrival: improving robustness and accuracy for multi-speaker scenarios

Auditory inspired methods for localization of multiple concurrent speakers

Contact Info

Product

Resources

About