The paper discusses the application of convolutional neural networks (CNNs) to minimum variance distortionless response (MVDR) localization schemes. We investigate the direction of arrival (DOA) estimation problem in noisy and reverberant conditions using an uniform linear array (ULA). CNNs are used to process the multichannel data from the ULA and to improve the data fusion scheme which is performed in the steered response power (SRP) computation. CNNs improve the incoherent frequency fusion of the narrowband response power by weighting the components, reducing the deleterious effects of those components affected by artifacts due to noise and reverberation. The use of CNNs avoids the necessity of previously encoding the multichannel data into selected acoustic cues with the advantage to exploit its ability in recognizing geometrical pattern similarity. Experiments with both simulated and real acoustic data demonstrate the superior localization performance of the proposed SRP beamformer with respect to other state-ofthe-art techniques.
Abstract-The steered response power (SRP) algorithms have been shown to be among the most effective and robust ones in noisy environments for direction of arrival (DOA) estimation. In broadband signal applications, the SRP methods typically perform their computations in the frequency-domain by applying a fast Fourier transform (FFT) on a signal portion, calculating the response power on each frequency bin, and subsequently fusing these estimates to obtain the final result. We introduce a frequency response incoherent fusion method based on a normalized arithmetic mean (NAM). Experiments are presented that rely on the SRP algorithms for the localization of motor vehicles in a noisy outdoor environment, focusing our discussion on performance differences with respect to different signal-tonoise ratios (SNR), and on spatial resolution issues for closely spaced sources. We demonstrate that the proposed fusion method provides higher resolution for the delay-and-sum SRP, and improved performances for minimum variance distortionless response (MVDR) and multiple signal classification (MUSIC).Index Terms-Broadband steered response power, incoherent frequency fusion, normalized arithmetic mean, direction of arrival estimation, microphone array.
The steered response power phase transform (SRP-PHAT) is a beamformer method very attractive in acoustic localization applications due to its robustness in reverberant environments. This paper presents a spatial grid design procedure, called the geometrically sampled grid (GSG), which aims at computing the spatial grid by taking into account the discrete sampling of time difference of arrival (TDOA) functions and the desired spatial resolution. A SRP-PHAT localization algorithm based on the GSG method is also introduced. The proposed method exploits the intersections of the discrete hyperboloids representing the TDOA information domain of the sensor array, and projects the whole TDOA information on the space search grid. The GSG method thus allows one to design the sampled spatial grid which represents the best search grid for a given sensor array, it allows one to perform a sensitivity analysis of the array and to characterize its spatial localization accuracy, and it may assist the system designer in the reconfiguration of the array. Experimental results using both simulated data and real recordings show that the localization accuracy is substantially improved both for high and for low spatial resolution, and that it is closely related to the proposed power response sensitivity measure.
The purpose of this study is to explore the possibility for physically based mathematical models of the voice source to accurately reproduce inverse filtered glottal volume-velocity waveforms. A low-dimensional, self-oscillating model of the glottal source with waveform-matching properties is proposed. The model relies on a lumped mechano-aerodynamic scheme loosely inspired by the one- and multimass lumped models. The vocal folds are represented by a single mechanical resonator and a propagation line which takes into account the vertical phase differences. The vocal-fold displacement is coupled to the glottal flow by means of an aerodynamic driving block which includes a general parametric nonlinear component. The principal characteristics of the flow-induced oscillations are retained, and the overall model is able to match inverse-filtered glottal flow signals. The method offers in principle the possibility of performing transformations of the glottal flow by acting on the physiologically based parameters of the model. This is a desirable property, e.g., for speech synthesis applications. The model was tested on a data set which included inverse-filtered glottal flow waveforms of different characteristics. The results demonstrate the possibility of reproducing natural speech waveforms with high accuracy, and of controlling important characteristics of the synthesis such as pitch.
Expression is an important aspect of music performance. It is the added value of a performance and is part of the reason that music is interesting to listen to and sounds alive. Understanding and modeling expressive content communication is important for many engineering applications in information technology. For example, in multimedia products, textual information is enriched by means of graphical and audio objects. In this paper, we present an original approach to modify the expressive content of a performance in a gradual way, both at the symbolic and signal levels. To this purpose, we discuss a model that applies a smooth morphing among performances with different expressive content, adapting the audio expressive character to the user’s desires. Morphing can be real- ized with a wide range of graduality (from abrupt to very smooth), allowing adaptation of the system to different situations. The sound rendering is obtained by interfacing the expressiveness model with a dedicated postprocessing environment, which allows for the transformation of the event cues. The processing is based on the organized control of basic audio effects. Among the basic effects used, an original method for the spectral processing of audio is introduced
In acoustic array processing, beamforming is a class of algorithms commonly used to estimate the position of a radiating sound source. This paper presents a diagonal unloading (DU) transformation method for the conventional response power beamforming to achieve robust localization with low computational complexity. The transformation is obtained by subtracting an opportune diagonal matrix from the covariance matrix of the array output vector. Specifically, the DU beamformer aims at subtracting the signal subspace from the noisy signal space. It is hence a data-dependent covariance matrix conditioning method. We show how to calculate precisely the unloading parameters, and we present a comparison of the proposed DU beamforming, the robust minimum variance distortionless response (MVDR) filter and the multiple signal classification (MUSIC) method, in terms of their respective eigenanalyses. Theoretical analysis and experiments conducted on both simulated and real acoustic data demonstrate that the DU beamformer localization performance is comparable to that of robust MVDR and MUSIC. Since its computational cost is equivalent to that of a conventional beamformer, the proposed DU beamformer method can thus be very attractive due to its effectiveness and computational efficiency.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.