Abstract:Sound source localization and signal segregation using a small number of microphone elements is expected in not only multimedia products but also in daily-use products, such as hearing aids. The frequency domain binaural model can localize a sound source and segregate signals coming from a specific direction using two input signals. In this paper, a method of two sound sources localization in azimuth and elevation using interaural phase and level differences is proposed. The performance of this localization is… Show more
“…Currently, we are testing and evaluating this algorithm [15] with the physical realization (humanoid robot head) from Fig. 1.…”
Section: Preliminary Resultsmentioning
confidence: 99%
“…While searching for the most suitable model for horizontal and vertical signal localization in humanoid robots, we implemented a promising binaural model [14] Block diagram of the binaural model of Chisaki et al [15]. The input signal consists of the left and the right audio channels of the two ears.…”
Section: Localizationmentioning
confidence: 99%
“…2, is based on the computation of the interaural phase differences (IPD) and the interaural level differences (ILD) in each frequency domain. In a nutshell, the idea of this algorithm is to compare the interaural difference values to a list of the previously recorded reference values, each representing a unique direction [15].…”
We present a prototype of a humanoid robot head equipped with human-like speech sound localization and production systems designed for a new generation of robots that should autonomously evolve language and other cognitive skills. Similarly to the human auditory apparatus, the robot head contains a binaural sensor system based upon a frequency domain binaural model. This enables the robot to detect and locate the speaker autonomously on the basis of the produced speech signals. However, the temporal regularity of incoming sounds is in humans analyzed on different time scales, with the millisecond range giving rise to the sensation of pitch and the periods on the order of seconds giving rise to the sensation of rhythm. In addition, unlike for humans, detecting and localizing multiple sound signals is a rather nontrivial problem for machine audition. We therefore discuss a possible implementation of human-like spatiotemporal processing of sounds in single and multisource scenarios. Our future goals are to adequately combine the constructed speech synthesis and physical audio systems, and to establish an algorithm for detailed spatiotemporal localization of both single and concurrent speech sound sources, with roughly human-like temporal and spatial processing capabilities.
“…Currently, we are testing and evaluating this algorithm [15] with the physical realization (humanoid robot head) from Fig. 1.…”
Section: Preliminary Resultsmentioning
confidence: 99%
“…While searching for the most suitable model for horizontal and vertical signal localization in humanoid robots, we implemented a promising binaural model [14] Block diagram of the binaural model of Chisaki et al [15]. The input signal consists of the left and the right audio channels of the two ears.…”
Section: Localizationmentioning
confidence: 99%
“…2, is based on the computation of the interaural phase differences (IPD) and the interaural level differences (ILD) in each frequency domain. In a nutshell, the idea of this algorithm is to compare the interaural difference values to a list of the previously recorded reference values, each representing a unique direction [15].…”
We present a prototype of a humanoid robot head equipped with human-like speech sound localization and production systems designed for a new generation of robots that should autonomously evolve language and other cognitive skills. Similarly to the human auditory apparatus, the robot head contains a binaural sensor system based upon a frequency domain binaural model. This enables the robot to detect and locate the speaker autonomously on the basis of the produced speech signals. However, the temporal regularity of incoming sounds is in humans analyzed on different time scales, with the millisecond range giving rise to the sensation of pitch and the periods on the order of seconds giving rise to the sensation of rhythm. In addition, unlike for humans, detecting and localizing multiple sound signals is a rather nontrivial problem for machine audition. We therefore discuss a possible implementation of human-like spatiotemporal processing of sounds in single and multisource scenarios. Our future goals are to adequately combine the constructed speech synthesis and physical audio systems, and to establish an algorithm for detailed spatiotemporal localization of both single and concurrent speech sound sources, with roughly human-like temporal and spatial processing capabilities.
“…Chisaki et al 4 suggest to give more importance to bins with higher signal energy because a higher SNR can be expected for those. COMPaSS uses a similar weighting of the frequency bins based on signal energy and the achieved similarity values.…”
Section: Filter Scoring and Extractionmentioning
confidence: 99%
“…It was developed especially for speech sources and has been used as a front end for a speech recognition system. 7 Chisaki et al 4 showed that FDBM is capable to localize two concurrent sound sources in azimuthal and elevation direction with high accuracy.…”
Sound source localization algorithms determine the physical position of a sound source in respect to a listener. For practical applications, a localization algorithm design has to take into account real world conditions like multiple active sources, reverberation, and noise. The application can impose additional constraints on the algorithm, e.g., a requirement for low latency. This work defines the most important constraints for practical applications, introduces an algorithm, which tries to fulfill all requirements as good as possible, and compares it to state-of-the-art sound source localization approaches.
An acoustic VR system provides a three-dimensional acoustical sensation of an existing sound field (as at concert halls, stadiums, or disaster sites) and/or an imaginary sound field (as in movies or video games), independent of time and space. This chapter introduces the system configuration and the applications of acoustic VR systems. 13.1 System Configuration An example of a typical configuration of an acoustic VR system is shown in Fig. 13.1. The system consists of hardware (a PC, a digital audio interface, headphones, earplug-type microphones, and a head-tracker), software for signal processing, and a database (HRTF and pinna shape). The main function of the acoustic VR system is to reproduce the ear-input signals obtained in an arbitrary sound field through headphones by the signal processing described in Chap. 12. This signal processing (convolution between a sound source signal and HRIRs) is performed on the PC. Another function is to change the HRTFs to those of another direction in response to the head movement of a listener. In order to capture the direction of the listener's head, a head tracker is used. Rewriting of the HRTFs must be done within the threshold for the detection of system delay, i.e., 80 ms (Yairi et al. 2005). Various systems have adopted an individualization function for HRTFs to ensure accurate sound image localization. An example system is shown in Fig. 13.2. The external specifications of the acoustic VR system, i.e., the Sound Image Reproduction system with Individualized-HRTF, graphical User-interface and Successive head-movement tracking (SIRIUS), which was developed in the author's lab, are shown in Table 13.1. This system runs on a Windows PC. An HRIR database (response length: 512 samples) is stored on the PC. A sound source signal and HRIRs are convolved in real time in order to control the direction and distance of a sound image.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.