Auditory inspired methods for localization of multiple concurrent speakers

Habib, Tufail; Romsdorfer, Harald

doi:10.1016/j.csl.2012.09.003

Cited by 5 publications

(5 citation statements)

References 43 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The second row in contrast to the first one applies position estimation, beamforming and enhancement. In [10] we study different possibilities for these blocks in a similar database and we draw the conclusion that the best performance is obtained when we use the PoPi position [5], convex-optimized beamforming [11,12] and vector Taylor series enhancement (VTS) [13]. Here, the VTS uses 128 Gaussians trained on a clean version of the Dev1 (the organizers provided us the impulse responses).…”

Section: Analysis Of the Full System 41 Experimental Resultsmentioning

confidence: 99%

See 1 more Smart Citation

Room localization for distant speech recognition

et al. 2014

View full text Add to dashboard Cite

The problem of room localization is to determine where, in a multi-room environment, a person is producing a speech utterance. In our work, we are exploiting the information gained from a network of microphones installed all over a house, where the lack of calibration of the microphone energies creates an additional challenge. This paper compares room localizers based on different features (such as energy and cross-correlation between microphones) and classifiers (such as neural networks and discriminative analysis). In order to evaluate the different room localizers in terms of word accuracy this paper also presents a complete distant speech recognition system which tries to take advantage of synergy between the different components without using any oracle information. Finally, the system is analyzed in terms of computational and time resources.

show abstract

Section: Analysis Of the Full System 41 Experimental Resultsmentioning

confidence: 99%

“…This can be solved in different ways such as using the WLAN signal emitted by a device [7] or with video cameras [14]. Some literature has tried to estimate the speaker position inside of a room using a microphone array [5] or a microphone network [1,6]. The innovation of this paper is to localize the room using a microphone network.…”

Section: Introductionmentioning

confidence: 99%

Room localization for distant speech recognition

et al. 2014

View full text Add to dashboard Cite

show abstract

“…In another method [24], speech signal has been divided into several bands and the SRP values have been calculated on these bands; then, the maximum value of each band has been considered in the localisation process. Also, Habib and Romsdorfer [25] proposes a ‘position‐pitch’‐based algorithm for the localisation and tracking of concurrent speakers. This algorithm uses a multi‐band gammatone filterbank and a frequency‐selective criterion that groups frequency channels belonging to the same speaker.…”

Section: Subband Processing‐based Speaker Localisationmentioning

confidence: 99%

Subband processing‐based approach for the localisation of two simultaneous speakers

Firoozabadi

Abutalebi

2014

IET signal process.

View full text Add to dashboard Cite

Time difference of arrival (TDOA)-based techniques are a main category of speaker localisation methods. In a large subcategory of these methods, the generalised cross-correlation (GCC) is employed for TDOA estimation. In this study, the authors propose a subband processing-based method that computes the GCC of the microphone pairs in each subband. The information collected from different subbands is then combined together to estimate the direction of two simultaneous speakers. While the conventional methods consider the whole signal spectrum in the localisation procedure, the proposed method takes advantage of the difference in the frequency contents of the speakers. The proposed method computes the histograms of the peak positions of the GCC curve for each microphone pair in different subbands. These histograms are then fused using one of the three proposed histogram averaging methods, called simple, sectional, and weighted averaging. The proposed method has been evaluated on simulated and real speech data in noisy, reverberant, and noisy-reverberant conditions. The evaluation results demonstrate the superiority of the proposed subband processing-based method over its fullband counterpart. The authors' experiments also show that among different histogram averaging methods, the weighted averaging has greater performance in estimating the direction of speakers.

show abstract

“…However, its performance decreases in noisy and reverberant conditions as well as in multi-speaker scenarios. Different extensions have been proposed in [23][24][25][26][27] to increase the robustness of the algorithm in various aspects. The above stated subgrouping of the spectra (cf.…”

Section: Methods To Increase the Robustnessmentioning

confidence: 99%

“…In [22], a joint position and pitch (PoPi) estimation method has been proposed which is based on either cross-correlations or crosspower spectral densities (CPSDs). Several extensions have been proposed using cepstral weighting [23], gammatonelike weighting [24], time-domain GCC-PHAT replacement [25], particle filtering [26], and speaker-dependent subgrouping [27]. In [28], a different method based on a recurrent timing neural network is used for joint DOA and pitch estimation.…”

Section: Introductionmentioning

confidence: 99%

Joint estimation of pitch and direction of arrival: improving robustness and accuracy for multi-speaker scenarios

Gerlach

Bitzer

Goetze

et al. 2014

J AUDIO SPEECH MUSIC PROC.

View full text Add to dashboard Cite

In many speech communication applications, robust localization and tracking of multiple speakers in noisy and reverberant environments are of major importance. Several algorithms to tackle this problem have been proposed in the last decades. In this paper, we propose several extensions to a recently presented joint direction of arrival (DOA) and pitch estimation method, increasing its robustness in multi-speaker scenarios, noise, and reverberation. First, a spectral comb filter is added to the original algorithm to better cope with concurrent speakers. Second, the well-known generalized cross-correlation with phase transform (GCC-PHAT) is used as an additional weighting function to improve the DOA estimation accuracy in terms of correct hits. Third, using multiple microphone pairs, the multi-channel cross-correlation approach is incorporated to improve the robustness against noise and reverberation. In order to improve tracking for moving and even intersecting speakers, a particle filter is used. Experiments with real-world recordings in realistic acoustic conditions show that the proposed extensions increase the DOA hit rate by about 33% compared to the original algorithm for two step-wise moving sources at a signal-to-noise ratio (SNR) of 15 dB and a reverberation time RT 60 of 560 ms.

show abstract

Auditory inspired methods for localization of multiple concurrent speakers

Cited by 5 publications

References 43 publications

Room localization for distant speech recognition

Room localization for distant speech recognition

Subband processing‐based approach for the localisation of two simultaneous speakers

Joint estimation of pitch and direction of arrival: improving robustness and accuracy for multi-speaker scenarios

Contact Info

Product

Resources

About