Azimuthal and elevation localization of two sound sources using interaural phase and level differences

Chisaki, Yoshifumi; Kyoko, Nagata; Matsuo, Kiyosato; Nakashima, Hideharu; Usagawa, Tsuyoshi

doi:10.1250/ast.29.139

Cited by 11 publications

(8 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Currently, we are testing and evaluating this algorithm [15] with the physical realization (humanoid robot head) from Fig. 1.…”

Section: Preliminary Resultsmentioning

confidence: 99%

“…While searching for the most suitable model for horizontal and vertical signal localization in humanoid robots, we implemented a promising binaural model [14] Block diagram of the binaural model of Chisaki et al [15]. The input signal consists of the left and the right audio channels of the two ears.…”

Section: Localizationmentioning

confidence: 99%

“…2, is based on the computation of the interaural phase differences (IPD) and the interaural level differences (ILD) in each frequency domain. In a nutshell, the idea of this algorithm is to compare the interaural difference values to a list of the previously recorded reference values, each representing a unique direction [15].…”

Section: Localizationmentioning

confidence: 99%

See 2 more Smart Citations

Towards Human-Like Production and Binaural Localization of Speech Sounds in Humanoid Robots

Wolff

Lasseck

Hild

et al. 2009

2009 3rd International Conference on Bioinformatics and Biomedical Engineering

View full text Add to dashboard Cite

We present a prototype of a humanoid robot head equipped with human-like speech sound localization and production systems designed for a new generation of robots that should autonomously evolve language and other cognitive skills. Similarly to the human auditory apparatus, the robot head contains a binaural sensor system based upon a frequency domain binaural model. This enables the robot to detect and locate the speaker autonomously on the basis of the produced speech signals. However, the temporal regularity of incoming sounds is in humans analyzed on different time scales, with the millisecond range giving rise to the sensation of pitch and the periods on the order of seconds giving rise to the sensation of rhythm. In addition, unlike for humans, detecting and localizing multiple sound signals is a rather nontrivial problem for machine audition. We therefore discuss a possible implementation of human-like spatiotemporal processing of sounds in single and multisource scenarios. Our future goals are to adequately combine the constructed speech synthesis and physical audio systems, and to establish an algorithm for detailed spatiotemporal localization of both single and concurrent speech sound sources, with roughly human-like temporal and spatial processing capabilities.

show abstract

“…Currently, we are testing and evaluating this algorithm [15] with the physical realization (humanoid robot head) from Fig. 1.…”

Section: Preliminary Resultsmentioning

confidence: 99%

Section: Localizationmentioning

confidence: 99%

See 1 more Smart Citation

Towards Human-Like Production and Binaural Localization of Speech Sounds in Humanoid Robots

Wolff

Lasseck

Hild

et al. 2009

2009 3rd International Conference on Bioinformatics and Biomedical Engineering

View full text Add to dashboard Cite

show abstract

“…Chisaki et al 4 suggest to give more importance to bins with higher signal energy because a higher SNR can be expected for those. COMPaSS uses a similar weighting of the frequency bins based on signal energy and the achieved similarity values.…”

Section: Filter Scoring and Extractionmentioning

confidence: 99%

“…It was developed especially for speech sources and has been used as a front end for a speech recognition system. 7 Chisaki et al 4 showed that FDBM is capable to localize two concurrent sound sources in azimuthal and elevation direction with high accuracy.…”

Section: Compared Algorithmsmentioning

confidence: 99%

Low latency localization of multiple sound sources in reverberant environments

Đurković¹,

Habigt²,

Rothbucher³

et al. 2011

The Journal of the Acoustical Society of America

View full text Add to dashboard Cite

Sound source localization algorithms determine the physical position of a sound source in respect to a listener. For practical applications, a localization algorithm design has to take into account real world conditions like multiple active sources, reverberation, and noise. The application can impose additional constraints on the algorithm, e.g., a requirement for low latency. This work defines the most important constraints for practical applications, introduces an algorithm, which tries to fulfill all requirements as good as possible, and compares it to state-of-the-art sound source localization approaches.

show abstract

Acoustic VR System

Iida

2019

View full text Add to dashboard Cite

An acoustic VR system provides a three-dimensional acoustical sensation of an existing sound field (as at concert halls, stadiums, or disaster sites) and/or an imaginary sound field (as in movies or video games), independent of time and space. This chapter introduces the system configuration and the applications of acoustic VR systems. 13.1 System Configuration An example of a typical configuration of an acoustic VR system is shown in Fig. 13.1. The system consists of hardware (a PC, a digital audio interface, headphones, earplug-type microphones, and a head-tracker), software for signal processing, and a database (HRTF and pinna shape). The main function of the acoustic VR system is to reproduce the ear-input signals obtained in an arbitrary sound field through headphones by the signal processing described in Chap. 12. This signal processing (convolution between a sound source signal and HRIRs) is performed on the PC. Another function is to change the HRTFs to those of another direction in response to the head movement of a listener. In order to capture the direction of the listener's head, a head tracker is used. Rewriting of the HRTFs must be done within the threshold for the detection of system delay, i.e., 80 ms (Yairi et al. 2005). Various systems have adopted an individualization function for HRTFs to ensure accurate sound image localization. An example system is shown in Fig. 13.2. The external specifications of the acoustic VR system, i.e., the Sound Image Reproduction system with Individualized-HRTF, graphical User-interface and Successive head-movement tracking (SIRIUS), which was developed in the author's lab, are shown in Table 13.1. This system runs on a Windows PC. An HRIR database (response length: 512 samples) is stored on the PC. A sound source signal and HRIRs are convolved in real time in order to control the direction and distance of a sound image.

show abstract

Azimuthal and elevation localization of two sound sources using interaural phase and level differences

Cited by 11 publications

References 9 publications

Towards Human-Like Production and Binaural Localization of Speech Sounds in Humanoid Robots

Towards Human-Like Production and Binaural Localization of Speech Sounds in Humanoid Robots

Low latency localization of multiple sound sources in reverberant environments

Acoustic VR System

Contact Info

Product

Resources

About