Integrating binaural cues and blind source separation method for separating reverberant speech mixtures

Alinaghi, Atiyeh; Wang, Wenwu; Jackson, Philip J. B.

doi:10.1109/icassp.2011.5946377

Cited by 20 publications

(52 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We selected 15 utterances spoken by both male and female speakers at random of the same length (about 3s) and then shortened to 2.5 s as in [5].…”

Section: Resultsmentioning

confidence: 99%

“…Previous work [1,3,5,7] suggested that by normalizing and pre-whitening the observation the results will be improved. Therefore we will normalize the amplitude of the observations as follows:…”

Section: The Observation Setmentioning

confidence: 99%

“…In the mixture model, as in [5], we combine three cues, the T-F observations, the ILD values and the IPD values, denoted by x(ω,t), α(ω,t) and φ (ω,t), respectively, where the total number of time frames is T and the total number of frequency channels is Ω. The total number of initial sources is denoted by I.…”

Section: Model Description and Parameter Estimationmentioning

confidence: 99%

“…We examined a similar approach by integrating the spatial cues in the separation process. In this paper we model the T-F point as a mixture of complex-Gaussian distribution, similar to [7] and we also model the IPD and ILD as a mixture of Gaussians [4,5]. In this Bayesian framework, we establish proper conjugate prior distributions on the parameters of the model.…”

Section: Introductionmentioning

confidence: 99%

“…Hence, it is natural to employ the spatial cues in the separation process, such as the interaural level difference (ILD), the interaural phase difference (IPD), and the mixing vectors as examined in [3,4,5], where separation is achieved by modeling the various observations as Gaussian mixtures and applying the Expectation-Maximization (EM) algorithm to obtain the model parameters. However, there are some limitations with the EM algorithm.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Underdetermined Model-Based Blind Source Separation of Reverberant Speech Mixtures using Spatial Cues in a Variational Bayesian Framework

Popa¹,

Wang²,

Alinaghi³

2013

IET Intelligent Signal Processing Conference 2013 (ISP 2013)

View full text Add to dashboard Cite

Abstract. In this paper, we propose a new method for underdetermined blind source separation of reverberant speech mixtures by classifying each time-frequency (T-F) point of the mixtures according to a combined variational Bayesian model of spatial cues, under sparse signal representation assumption. We model the T-F observations by a variational mixture of circularly-symmetric complex-Gaussians. The spatial cues, e.g. interaural level difference (ILD), interaural phase difference (IPD) and mixing vector cues, are modelled by a variational mixture of Gaussians. We then establish appropriate conjugate prior distributions for the parameters of all the mixtures to create a variational Bayesian framework. Using the Bayesian approach we then iteratively estimate the hyper-parameters for the prior distributions by optimizing the variational posterior distribution. The main advantage of this approach is that no prior knowledge of the number of sources is needed, and it will be automatically determined by the algorithm. The proposed approach does not suffer from overfitting problem, as opposed to the Expectation-Maximization (EM) algorithm, therefore it is not sensitive to initializations.

show abstract

“…We selected 15 utterances spoken by both male and female speakers at random of the same length (about 3s) and then shortened to 2.5 s as in [5].…”

Section: Resultsmentioning

confidence: 99%