“…Magnitude pairs for flat S Log-magnitude ratio (LMR) [12]: While the sourcecancellation method removes the dependence on signal S, the resulting features are complex, noisy, and difficult to interpret. This can be avoided by considering the magnitude representation which gives the relative per-frequency energy between the channel signals.…”
“…Most closely related to our work are the source-cancellation and match-filtering algorithms [12], [13], [14], [15], where the binaural recordings (S L left, S R right ears) are represented as convolutions of a common sound-source signal S and the appropriate filters; for recording done in an anechoic space, these filters are the same-direction HRTFs (H L left, H R right ears). The per-frequency domain representation is given by…”
The human ability to localize sound-source direction using just two receivers is a complex process of direction inference from spectral cues of sound arriving at the ears. While these cues can be described using the well-known head-related transfer function (HRTF) concept, it is unclear as to how densely HRTF must be sampled and whether a higher-order representation is employed in localization. We propose a class of binaural sound source localization models to answer these two questions. First, using the sound received by two ears, we derive several binaural features that are invariant to the sound source signal. Second, these are implicitly mapped to a highdimensional reproducing kernel Hilbert space via a Gaussian process regression model for feature-direction tuples. Lastly, the features that are most relevant in the model are found via an efficient forward subset-selection method. Experimental results are shown for HRTFs belonging to the CIPIC database.
“…Magnitude pairs for flat S Log-magnitude ratio (LMR) [12]: While the sourcecancellation method removes the dependence on signal S, the resulting features are complex, noisy, and difficult to interpret. This can be avoided by considering the magnitude representation which gives the relative per-frequency energy between the channel signals.…”
“…Most closely related to our work are the source-cancellation and match-filtering algorithms [12], [13], [14], [15], where the binaural recordings (S L left, S R right ears) are represented as convolutions of a common sound-source signal S and the appropriate filters; for recording done in an anechoic space, these filters are the same-direction HRTFs (H L left, H R right ears). The per-frequency domain representation is given by…”
The human ability to localize sound-source direction using just two receivers is a complex process of direction inference from spectral cues of sound arriving at the ears. While these cues can be described using the well-known head-related transfer function (HRTF) concept, it is unclear as to how densely HRTF must be sampled and whether a higher-order representation is employed in localization. We propose a class of binaural sound source localization models to answer these two questions. First, using the sound received by two ears, we derive several binaural features that are invariant to the sound source signal. Second, these are implicitly mapped to a highdimensional reproducing kernel Hilbert space via a Gaussian process regression model for feature-direction tuples. Lastly, the features that are most relevant in the model are found via an efficient forward subset-selection method. Experimental results are shown for HRTFs belonging to the CIPIC database.
“…Two important applications in intelligent audio surveillance are abnormal event detection [3,4] and sound source localization [13]. A comprehensive review of methods for audio surveillance has been recently published [2].…”
We propose an architecture for real-time audio source localization based on the integration of localization methodologies within a framework that employs a cheap acquisition sensor. The architecture that we present takes as input the audio signals from two calibrated microphones. Then, it computes biological-inspired features of the sound signal and estimates its direction by means of a Gaussian Mixture Model estimator. We carried out an extensive experimental analysis on four data sets, one of which we realized and made publicly available. We evaluated several characteristics of the sound localization architecture and its use in real scenarios.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.