We present pyroomacoustics, a software package aimed at the rapid development and testing of audio array processing algorithms. The content of the package can be divided into three main components: an intuitive Python object-oriented interface to quickly construct different simulation scenarios involving multiple sound sources and microphones in 2D and 3D rooms; a fast C implementation of the image source model for general polyhedral rooms to efficiently generate room impulse responses and simulate the propagation between sources and receivers; and finally, reference implementations of popular algorithms for beamforming, direction finding, and adaptive filtering. Together, they form a package with the potential to speed up the time to market of new algorithms by significantly reducing the implementation overhead in the performance evaluation step.
We extend frequency-domain blind source separation based on independent vector analysis to the case where there are more microphones than sources. The signal is modelled as non-Gaussian sources in a Gaussian background. The proposed algorithm is based on a parametrization of the demixing matrix decreasing the number of parameters to estimate. Furthermore, orthogonal constraints between the signal and background subspaces are imposed to regularize the separation. The problem can then be posed as a constrained likelihood maximization. We propose efficient alternating updates guaranteed to converge to a stationary point of the cost function. The performance of the algorithm is assessed on simulated signals. We find that the separation performance is on par with that of the conventional determined algorithm at a fraction of the computational cost.
We present the concept of an acoustic rake receiver-a microphone beamformer that uses echoes to improve the noise and interference suppression. The rake idea is wellknown in wireless communications; it involves constructively combining different multipath components that arrive at the receiver antennas. Unlike spread-spectrum signals used in wireless communications, speech signals are not orthogonal to their shifts. Therefore, we focus on the spatial structure, rather than temporal. Instead of explicitly estimating the channel, we create correspondences between early echoes in time and image sources in space. These multiple sources of the desired and the interfering signal offer additional spatial diversity that we can exploit in the beamformer design.We present several "intuitive" and optimal formulations of acoustic rake receivers, and show theoretically and numerically that the rake formulation of the maximum signal-to-interferenceand-noise beamformer offers significant performance boosts in terms of noise and interference suppression. Beyond signal-tonoise ratio, we observe gains in terms of the perceptual evaluation of speech quality (PESQ) metric for the speech quality. We accompany the paper by the complete simulation and processing chain written in Python. The code and the sound samples are available online at http://lcav.github.io/AcousticRakeReceiver/.
In this paper we present FRIDA-an algorithm for estimating directions of arrival of multiple wideband sound sources. FRIDA combines multi-band information coherently and achieves stateof-the-art resolution at extremely low signal-to-noise ratios. It works for arbitrary array layouts, but unlike the various steered response power and subspace methods, it does not require a grid search. FRIDA leverages recent advances in sampling signals with a finite rate of innovation. It is based on the insight that for any array layout, the entries of the spatial covariance matrix can be linearly transformed into a uniformly sampled sum of sinusoids.Index Terms-Direction of arrival, finite rate of innovation, subspace method, search-free, wideband sources
A new iterative low complexity algorithm has been presented for computing the Walsh-Hadamard transform (WHT) of an N dimensional signal with a K-sparse WHT, where N is a power of two and K = O(N α ), scales sublinearly in N for some 0 < α < 1. Assuming a random support model for the nonzero transform domain components, the algorithm reconstructs the WHT of the signal with a sample complexity O(K log 2 ( N K )), a computational complexity O(K log 2 (K) log 2 ( N K )) and with a very high probability asymptotically tending to 1.The approach is based on the subsampling (aliasing) property of the WHT, where by a carefully designed subsampling of the time domain signal, one can induce a suitable aliasing pattern in the transform domain. By treating the aliasing patterns as parity-check constraints and borrowing ideas from erasure correcting sparse-graph codes, the recovery of the nonzero spectral values has been formulated as a belief propagation (BP) algorithm (peeling decoding) over an sparse-graph code for the binary erasure channel (BEC). Tools from coding theory are used to analyze the asymptotic performance of the algorithm in the "very sparse" (α ∈ (0, 1 3 ]) and the "less sparse" regime (α ∈ ( 1 3 , 1)).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.