Massive convolution is the basic operation in multichannel acoustic signal processing. This field has experienced a major development in recent years. One reason for this has been the increase in the number of sound sources used in playback applications available to users. Another reason is the growing need to incorporate new effects and to improve the hearing experience. Massive convolution requires high computing capacity. GPUs offer the possibility of parallelizing these operations. This allows us to obtain the processing result in much shorter time and to free up CPU resources. One important aspect lies in the possibility of overlapping the transfer of data from CPU to GPU and vice versa with the computation, in order to carry out real-time applications. Thus, a synthesis of 3D sound scenes could be achieved with only a peer-to-peer music streaming environment using a simple GPU in your computer, while the CPU in the computer is being used for other tasks. Nowadays, these effects are obtained in theaters or funfairs at a very high cost, requiring a large quantity of resources. Thus, our work focuses on two mains points: to describe an efficient massive convolution implementation and to incorporate this task to real-time multichannel-sound applications.
ElsevierBelloch Rodríguez, JA.; Vidal Maciá, AM.; Cobos Serrano, M. (2015). On the performance of multi-GPU-based expert systems for acoustic localization involving massive microphone array. Expert Systems with Applications. 42 (13) AbstractSound source localization is an important topic in expert systems involving microphone arrays, such as automatic camera steering systems, human-machine interaction, video gaming or audio surveillance. The Steered Response Power with Phase Transform (SRP-PHAT) algorithm is a well-known approach for sound source localization due to its robust performance in noisy and reverberant environments. This algorithm analyzes the sound power captured by an acoustic beamformer on a defined spatial grid, estimating the source location as the point that maximizes the output power. Since localization accuracy can be improved by using high-resolution spatial grids and a high number of microphones, accurate acoustic localization systems require high computational power. Graphics Processing Units (GPUs) are highly parallel programmable co-processors that provide massive computation when the needed operations are properly parallelized. Emerging GPUs offer multiple parallelism levels; however, properly managing their computational resources becomes a very challenging task. In fact, management issues become even more difficult when multiple GPUs are * Corresponding author: Phone Number +34-655436190 Email addresses: jobelrod@iteam.upv.es (Jose A. Belloch), agonzal@dcom.upv.es (Alberto Gonzalez), avidal@dsic.upv.es (Antonio M. Vidal), maximo.cobos@uv.es (Maximo Cobos) Preprint submitted to Journal of L A T E X TemplatesFebruary 25, 2015 involved, adding one more level of parallelism. In this paper, the performance of an acoustic source localization system using distributed microphones is analyzed over a massive multichannel processing framework in a multi-GPU system. The paper evaluates and points out the influence that the number of microphones and the available computational resources have in the overall system performance. Several acoustic environments are considered to show the impact that noise and reverberation have in the localization accuracy and how the use of massive microphone systems combined with parallelized GPU algorithms can help to mitigate substantially adverse acoustic effects. In this context, the proposed implementation is able to work in real time with high-resolution spatial grids and using up to 48 microphones. These results confirm the advantages of suitable GPU architectures in the development of real-time massive acoustic signal processing systems.
Multichannel acoustic signal processing has undergone major development in recent years. The incorporation of spatial information into an immersive audiovisual virtual environment or into video games provides better sense of "presence" to applications. Spatial sound consists in reproducing audio signals with spatial cues (spatial information embedded in the sound) through headphones. This spatial information allows the listener to identify the virtual positions of the sources corresponding to different sounds. Headphone-based spatial sound is obtained by filtering different sound sources through special filters called Head Related Transfer Functions (HRTFs) prior to render them through headphones. Efficient computation plays an important role when the number of sources to be managed is high. This situation increases the number of filtering operations, requiring high computing capacity specially when the virtual sources are moving. Graphics Processing Units (GPUs) are high parallel programmable co-processors that provide massive computation when the needed operations are properly parallelized. This paper discusses the design, the implementation and the performance of a headphone-based spatial audio application whose processing is totally carried out on a GPU. This application is able to interact with the listener who can select and change the location of the sound sources in real-time. This work analyzes also specific computational aspects inside the CUDA environment in order to successfully exploit GPU resources. Results show that the proposed application is able to move up to 2500 sources simultaneously, while leaving free CPU resources for other tasks. This work emphasizes the importance of analyzing all CUDA aspects, since they can influence drastically the performance.
Multichannel acoustic signal processing has undergone major development in recent years due to the increased complexity of current audio processing applications, which involves the processing of multiple sources, channels, or filters. A general scenario that appears in this context is the immersive reproduction of binaural audio without the 1 use of headphones, which requires the use of a crosstalk canceler. However, Generalized Crosstalk Cancellation and Equalization (GCCE) requires high computing capacity, which is a considerable limitation for real-time applications. This paper discusses the design and implementation of all the processing blocks of a multichannel convolution on a GPU for real-time applications. To this end, a very efficient filtering method using specific data structures is proposed, which takes advantage of overlap-save filtering and filter fragmentation. It has been shown that, for a real-time application with 22 inputs and 64 outputs, the system is capable of managing 1408 filters of 2048 coefficients with a latency time less than 6 ms. The proposed GPU implementation can be easily adapted to any acoustic environment, demonstrating the validity of these co-processors for managing intensive multichannel audio applications.
Wave Field Synthesis (WFS) is a multichannel audio reproduction method, of a considerable computational cost that renders an accurate spatial sound field using a large number of loudspeakers to emulate virtual sound sources. The moving of sound source locations can be improved by using fractional delay filters, and room reflections can be compensated by using an inverse filter bank that corrects the room effects at selected points within the listening area. However, both the fractional delay filters and the room compensation filters further increase the computational requirements of the WFS system. This paper analyzes the performance of a WFS system composed of 96 loudspeakers which integrates both strategies. In order to deal with the large computational complexity, we explore the use of a graphics processing unit (GPU) as a massive signal co-processor to increase the capabilities of the WFS system.The performance of the method as well as the benefits of the GPU acceleration are demonstrated by considering different sizes of room compensation filters and fractional delay filters of order 9. The results show that a 96-speaker WFS system that is efficiently implemented on a state-of-art GPU can room compensation filters having more than 4,000 coefficients each.
A graphic equalizer is an adjustable filter in which the command gain of each frequency band is practically independent of the gains of other bands. Designing a graphic equalizer with a high precision requires evaluating a target response that interpolates the magnitude response at several frequency points between the command gains. Good accuracy has been previously achieved by using polynomial interpolation methods such as cubic Hermite or spline interpolation. However, these methods require large computational resources, which is a limitation in real-time applications. This paper proposes an efficient way of computing the target response without sacrificing the approximation accuracy. This new approach called Linear Interpolation with Constant Segments (LICS) reduces the computing time of the target response by 55% and has an intrinsic parallel structure. Performance of the LICS method is assessed on an ARM Cortex-A7 core, which is commonly used in embedded systems.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.