2013 IEEE International Conference on Acoustics, Speech and Signal Processing 2013
DOI: 10.1109/icassp.2013.6637612
|View full text |Cite
|
Sign up to set email alerts
|

Variational EM for binaural sound-source separation and localization

Abstract: The sound-source separation and localization (SSL) problems are addressed within a unified formulation. Firstly, a mapping between white-noise source locations and binaural cues is estimated. Secondly, SSL is solved via Bayesian inversion of this mapping in the presence of multiple sparse-spectrum emitters (such as speech), noise and reverberations. We propose a variational EM algorithm which is described in detail together with initialization and convergence issues. Extensive real-data experiments show that t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
32
0

Year Published

2014
2014
2018
2018

Publication Types

Select...
4
3
2

Relationship

2
7

Authors

Journals

citations
Cited by 24 publications
(32 citation statements)
references
References 17 publications
0
32
0
Order By: Relevance
“…In the arXiv:1610.04770v1 [cs.SD] 15 Oct 2016 binaural hearing context, Deleforge and Horaud have proposed a probabilistic piecewise affine regression model that infers the localization-to-interaural data mapping and its inverse [18]. They have extended this approach to the case of multiple sources using the variational Expectation Maximization (EM) framework [19], [20]. In [21], another approach was presented based on a Gaussian Mixture Model (GMM) which was used to learn the azimuth-dependent distribution of the binaural feature space.…”
Section: Introductionmentioning
confidence: 99%
“…In the arXiv:1610.04770v1 [cs.SD] 15 Oct 2016 binaural hearing context, Deleforge and Horaud have proposed a probabilistic piecewise affine regression model that infers the localization-to-interaural data mapping and its inverse [18]. They have extended this approach to the case of multiple sources using the variational Expectation Maximization (EM) framework [19], [20]. In [21], another approach was presented based on a Gaussian Mixture Model (GMM) which was used to learn the azimuth-dependent distribution of the binaural feature space.…”
Section: Introductionmentioning
confidence: 99%
“…The second direction is data-driven, and consists in learning a mapping from measured high-dimensional acoustic features to source postions. Such mappings are learned from carefully recorded datasets in a supervised [5,6] or semi-supervised [7] way. Since obtaining these datasets is time consuming, the methods are usually working well for one specific room and setup, and are hard to generalize in practice.…”
Section: Introductionmentioning
confidence: 99%
“…Initializing˜ to zero and choosing permutation π, in the first step, the squared distance matrix M˜ π as defined in (10) is double centered [28] 3 (steps 4) followed by SVD to obtain a low-rank matrixM˜ π along with the position matrixX˜ π ;d π is equal to the last row ofX˜ π . In the second step,˜ is updated by solving (11) and (12). Based on the new estimate of˜ ,M˜ π is updated using (10).…”
Section: ) Synchronizationmentioning
confidence: 99%
“…Furthermore, the data-driven learning and generative modeling of location-dependent spatial characteristics has been shown promising for sound source localization in a reverberant environment; in [12] and [13] room-and microphone locationspecific models were trained on white noise signals and incorporated for 2D-localization with two microphones. Nesta and Omologo [14] presented an approach that exploited sparsity of source signals in the cross-power spectral domain and accounted in a statistical manner for deviations of the sources' spatial characteristics from an ideal anechoic propagation model caused by multipath effect.…”
Section: Introductionmentioning
confidence: 99%