Institute of Electrical and Electronics Engineers (IEEE)CobosAbstract-The Steered Response Power -Phase Transform (SRP-PHAT) algorithm has been shown to be one of the most robust sound source localization approaches operating in noisy and reverberant environments. However, its practical implementation is usually based on a costly fine grid-search procedure, making the computational cost of the method a real issue. In this paper, we introduce an effective strategy that extends the conventional SRP-PHAT functional with the aim of considering the volume surrounding the discrete locations of the spatial grid. As a result, the modified functional performs a full exploration of the sampled space rather than computing the SRP at discrete spatial positions, increasing its robustness and allowing for a coarser spatial grid. To this end, the Generalized Cross-Correlation (GCC) function corresponding to each microphone pair must be properly accumulated according to the defined microphone setup. Experiments carried out under different acoustic conditions confirm the validity of the proposed approach.Index Terms-sound source localization, SRP-PHAT, microphone array.
Wireless acoustic sensor networks (WASNs) are formed by a distributed group of acoustic-sensing devices featuring audio playing and recording capabilities. Current mobile computing platforms offer great possibilities for the design of audio-related applications involving acoustic-sensing nodes. In this context, acoustic source localization is one of the application domains that have attracted the most attention of the research community along the last decades. In general terms, the localization of acoustic sources can be achieved by studying energy and temporal and/or directional features from the incoming sound at different microphones and using a suitable model that relates those features with the spatial location of the source (or sources) of interest. This paper reviews common approaches for source localization in WASNs that are focused on different types of acoustic features, namely, the energy of the incoming signals, their time of arrival (TOA) or time difference of arrival (TDOA), the direction of arrival (DOA), and the steered response power (SRP) resulting from combining multiple microphone signals. Additionally, we discuss methods not only aimed at localizing acoustic sources but also designed to locate the nodes themselves in the network. Finally, we discuss current challenges and frontiers in this field.
Noise pollution caused by vehicular traffic is a common problem in urban environments that has been shown to affect people's health and children's cognition. In the last decade, several studies have been conducted to assess this noise, by measuring the equivalent noise pressure level (called Leq) to acquite an accurate sound map using wireless networks with acoustic sensors. However, even with similar values of Leq, people can feel the noise differently according to its frequency characteristics. Thus, indexes which can express people's feelings by subjective measures are required. In this paper we analyze the suitability of using the psycho-acoustic metrics given by the Zwicker's model, instead of just only considering Leq. The goal is to evaluate the hardware limitations of a low-cost wireless acoustic sensor network that is used to measure the annoyance, using two types of commercial and off-the shelf sensor nodes, Tmote-Invent nodes and Raspberry Pi platforms. Moreover, to calculate the parameters using these platforms, different simplifications to the Zwicker's model based on the specific features of road traffic noise are proposed. To validate the different alternatives, the aforementioned nodes are tested in a traffic congested area of Valencia City in a vertical and horizontal network deployment. Based on the results, it is observed that the Raspberry Pi platforms are a feasible low-cost alternative to increase the spatial-temporal resolution, while Tmote-Invent nodes do not confirm their suitablity due to their limited memory and calibration issues.
Source localization using the steered response power (SRP) usually requires a costly grid-search procedure. To address this issue, a modified SRP algorithm was recently introduced, providing improved robustness when using coarser spatial grids. In this letter, an iterative method based on the modified SRP is presented. A coarse spatial grid is initially evaluated with the modified SRP, selecting the point with the highest accumulated value. Then, its corresponding volume is iteratively decomposed by using a finer spatial grid. Experiments have shown that this method provides almost the same accuracy as the fine-grid search with a substantial reduction of functional evaluations.
Close-microphone techniques are extensively employed in many live music recordings, allowing for interference rejection and reducing the amount of reverberation in the resulting instrument tracks. However, despite the use of directional microphones, the recorded tracks are not completely free from source interference, a problem which is commonly known as microphone leakage. While source separation methods are potentially a solution to this problem, few approaches take into account the huge amount of prior information available in this scenario. In fact, besides the special properties of close-microphone tracks, the knowledge on the number and type of instruments making up the mixture can also be successfully exploited for improved separation performance. In this paper, a nonnegative matrix factorization (NMF) method making use of all the above information is proposed. To this end, a set of instrument models are learnt from a training database and incorporated into a multichannel extension of the NMF algorithm. Several options to initialize the algorithm are suggested, exploring their performance in multiple music tracks and comparing the results to other state-of-the-art approaches.
Localization of sounds in physical space plays a very important role in multiple audio-related disciplines, such as music, telecommunications, and audiovisual productions. Binaural recording is the most commonly used method to provide an immersive sound experience by means of headphone reproduction. However, it requires a very specific recording setup using high-fidelity microphones mounted in a dummy head. In this paper, we present a novel processing framework for binaural sound recording and reproduction that avoids the use of dummy heads, which is specially suitable for immersive teleconferencing applications. The method is based on a time-frequency analysis of the spatial properties of the sound picked up by a simple tetrahedral microphone array, assuming source sparseness. The experiments carried out using simulations and a real-time prototype confirm the validity of the proposed approach.
Abstract-Automatic ranging and self-positioning is a very desirable property in wireless acoustic sensor networks (WASNs) where nodes have at least one microphone and one loudspeaker. However, due to environmental noise, interference and multipath effects, audio-based ranging is a challenging task. This paper presents a fast ranging and positioning strategy that makes use of the correlation properties of pseudo-noise (PN) sequences for estimating simultaneously relative time-of-arrivals (TOAs) from multiple acoustic nodes. To this end, a proper test signal design adapted to the acoustic node transducers is proposed. In addition, a novel self-interference reduction method and a peak matching algorithm are introduced, allowing for increased accuracy in indoor environments. Synchronization issues are removed by following a BeepBeep strategy, providing range estimates that are converted to absolute node positions by means of multidimensional scaling (MDS). The proposed approach is evaluated both with simulated and real experiments under different acoustical conditions. The results using a real network of smartphones and laptops confirm the validity of the proposed approach, reaching an average ranging accuracy below 1 centimeter.
The generalized cross correlation (GCC) is regarded as the most popular approach for estimating the time difference of arrival (TDOA) between the signals received at two sensors. Time delay estimates are obtained by maximizing the GCC output, where the direct-path delay is usually observed as a prominent peak. Moreover, GCCs play also an important role in steered response power (SRP) localization algorithms, where the SRP functional can be written as an accumulation of the GCCs computed from multiple sensor pairs. Unfortunately, the accuracy of TDOA estimates is affected by multiple factors, including noise, reverberation and signal bandwidth. In this paper, a sub-band approach for time delay estimation aimed at improving the performance of the conventional GCC is presented. The proposed method is based on the extraction of multiple GCCs corresponding to different frequency bands of the cross-power spectrum phase in a sliding-window fashion. The major contributions of this paper include: 1) a sub-band GCC representation of the cross-power spectrum phase that, despite having a reduced temporal resolution, provides a more suitable representation for estimating the true TDOA; 2) such matrix representation is shown to be rank one in the ideal noiseless case, a property that is exploited in more adverse scenarios to obtain a more robust and accurate GCC; 3) we propose a set of low-rank approximation alternatives for processing the sub-band GCC matrix, leading to better TDOA estimates and source localization performance. An extensive set of experiments is presented to demonstrate the validity of the proposed approach.Index Terms-Time delay estimation, GCC, SVD, weighted SVD, sub-band processing, SRP-PHAT. M. Cobos is with the
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.