Abstract:Direction of arrival estimation using a spherical microphone array is an important and growing research area. One promising algorithm is the recently proposed Subspace PseudoIntensity Vector method. In this contribution the Subspace Pseudo-Intensity Vector method is combined with a state-ofthe-art method for robustly estimating the centres of mass in a 2D histogram based on matching pursuits. The performance of the improved Subspace Pseudo-Intensity Vector method is evaluated in the context of localising multi… Show more
“…Over the years, several approaches have been developed for the task of broadband DOA estimation. Some popular approaches are: i) subspace based approaches such as multiple signal classification (MUSIC) [1], [2], ii) time difference of arrival (TDOA) based approaches that use the family of generalized cross correlation (GCC) methods [3], [4], iii) generalizations of the cross-correlation methods such as steered response power with phase transform (SRP-PHAT) [5], and multichannel cross correlation coefficient (MCCC) [6], iv) adaptive multichannel time delay estimation using blind system identification based methods [7], v) probabilistic model based methods such as maximum likelihood method [8] and vi) methods based on histogram analysis of narrowband DOA estimates [9], [10]. These methods are generally formulated under the assumption of free-field propagation of sound waves, however in indoor acoustic environments this assumption is violated due to the presence of reverberation leading to severe degradation in their performance.…”
Supervised learning based methods for source localization, being data driven, can be adapted to different acoustic conditions via training and have been shown to be robust to adverse acoustic environments. In this paper, a convolutional neural network (CNN) based supervised learning method for estimating the direction-of-arrival (DOA) of multiple speakers is proposed. Multi-speaker DOA estimation is formulated as a multi-class multi-label classification problem, where the assignment of each DOA label to the input feature is treated as a separate binary classification problem. The phase component of the shorttime Fourier transform (STFT) coefficients of the received microphone signals are directly fed into the CNN, and the features for DOA estimation are learnt during training. Utilizing the assumption of disjoint speaker activity in the STFT domain, a novel method is proposed to train the CNN with synthesized noise signals. Through experimental evaluation with both simulated and measured acoustic impulse responses, the ability of the proposed DOA estimation approach to adapt to unseen acoustic conditions and its robustness to unseen noise type is demonstrated. Through additional empirical investigation, it is also shown that with an array of M microphones our proposed framework yields the best localization performance with M-1 convolution layers. The ability of the proposed method to accurately localize speakers in a dynamic acoustic scenario with varying number of sources is also shown.
“…Over the years, several approaches have been developed for the task of broadband DOA estimation. Some popular approaches are: i) subspace based approaches such as multiple signal classification (MUSIC) [1], [2], ii) time difference of arrival (TDOA) based approaches that use the family of generalized cross correlation (GCC) methods [3], [4], iii) generalizations of the cross-correlation methods such as steered response power with phase transform (SRP-PHAT) [5], and multichannel cross correlation coefficient (MCCC) [6], iv) adaptive multichannel time delay estimation using blind system identification based methods [7], v) probabilistic model based methods such as maximum likelihood method [8] and vi) methods based on histogram analysis of narrowband DOA estimates [9], [10]. These methods are generally formulated under the assumption of free-field propagation of sound waves, however in indoor acoustic environments this assumption is violated due to the presence of reverberation leading to severe degradation in their performance.…”
Supervised learning based methods for source localization, being data driven, can be adapted to different acoustic conditions via training and have been shown to be robust to adverse acoustic environments. In this paper, a convolutional neural network (CNN) based supervised learning method for estimating the direction-of-arrival (DOA) of multiple speakers is proposed. Multi-speaker DOA estimation is formulated as a multi-class multi-label classification problem, where the assignment of each DOA label to the input feature is treated as a separate binary classification problem. The phase component of the shorttime Fourier transform (STFT) coefficients of the received microphone signals are directly fed into the CNN, and the features for DOA estimation are learnt during training. Utilizing the assumption of disjoint speaker activity in the STFT domain, a novel method is proposed to train the CNN with synthesized noise signals. Through experimental evaluation with both simulated and measured acoustic impulse responses, the ability of the proposed DOA estimation approach to adapt to unseen acoustic conditions and its robustness to unseen noise type is demonstrated. Through additional empirical investigation, it is also shown that with an array of M microphones our proposed framework yields the best localization performance with M-1 convolution layers. The ability of the proposed method to accurately localize speakers in a dynamic acoustic scenario with varying number of sources is also shown.
“…We investigate the criteria under which smoothed histograms of PIVs and SSPIVs give accurate estimates of the DOAs of multiple sources in a noisy reverberant environment, including when sources are moving. Some of the first steps of an earlier version of the SSPIV method were presented in [13] and [29]. The current paper extends both the theoretical analysis and the evaluation of the PIV method compared to [8], especially in the context of multiple and moving speakers and in real-world applications.…”
Section: Introductionmentioning
confidence: 68%
“…The relative gain, 0 ≤ g ≤ 1, and phase, −π < γ ≤ π, of the second plane wave with the respect to the first give α 2 = gα 1 and β 2 = β 1 + γ. Therefore, from (29),…”
Section: Coherent Sourcesmentioning
confidence: 99%
“…We assume that β 1 and β 2 are independent with identical uniform distribution U(0, 2π) such that ∆β = β 1 − β 2 is a triangular distribution over the interval ∆β ∈ [−2π, 2π] which, due to periodicity of the phase, reduces to ∆β ∈ [−π, π] with probability p (∆β) = 1/(2π). The expected value ofĨ is obtained by integrating (29) with respect to ∆β, …”
Abstract-Direction of Arrival (DOA) estimation is a fundamental problem in acoustic signal processing. It is used in a diverse range of applications, including spatial filtering, speech dereverberation, source separation and diarization. Intensity vector-based DOA estimation is attractive, especially for spherical sensor arrays, because it is computationally efficient. Two such methods are presented which operate on a spherical harmonic decomposition of a sound field observed using a spherical microphone array. The first uses Pseudo-Intensity Vectors (PIVs) and works well in acoustic environments where only one sound source is active at any time. The second uses Subspace Pseudo-Intensity Vectors (SSPIVs) and is targeted at environments where multiple simultaneous sources and significant levels of reverberation make the problem more challenging. Analytical models are used to quantify the effects of an interfering source, diffuse noise and sensor noise on PIVs and SSPIVs. The accuracy of DOA estimation using PIVs and SSPIVs is compared against the state-of-the-art in simulations including realistic reverberation and noise for single and multiple, stationary and moving sources. Finally, robust performance of the proposed methods is demonstrated using speech recordings in real acoustic environments.
A conventional approach to wideband Multi-Source (MS) Direction-of-Arrival (DOA) estimation is to perform Single Source (SS) DOA estimation in Time-Frequency (TF) bins for which a SS assumption is valid. The typical SS-validity confidence metrics analyse the validity of the SS assumption over a fixed-size TF region local to the TF bin. The performance of such methods degrades as the number of simultaneously active sources increases due to the associated decrease in the size of the TF regions where the SS assumption is valid. A SS-validity confidence metric is proposed that exploits a dynamic MS assumption over relatively larger TF regions. The proposed metric first clusters the initial DOA estimates (one per TF bin) and then uses the members' spatial consistency as well as its cluster's spread to weight each TF bin. Distance-based and density-based clustering are employed as two alternative approaches for clustering DOAs. A noise-robust density-based clustering is also used in an evolutionary framework to propose a method for source counting and source direction estimation. The evaluation results based on simulations and also with real recordings show that the proposed weighting strategy significantly improves the accuracy of source counting and MS DOA estimation compared to the state-of-the-art.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.