Giovanni Pepe scite author profile

Audio equalization is an active research topic aiming at improving the audio quality of a loudspeaker system by correcting the overall frequency response using linear filters. The estimation of their coefficients is not an easy task, especially in binaural and multipoint scenarios, due to the contribution of multiple impulse responses to each listening point. This paper presents a deep learning approach for tuning filter coefficients employing three different neural networks architectures—the Multilayer Perceptron, the Convolutional Neural Network, and the Convolutional Autoencoder. Suitable loss functions are proposed for each architecture, and are formulated in terms of spectral Euclidean distance. The experiments were conducted in the automotive scenario, considering several loudspeakers and microphones. The obtained results show that deep learning techniques give superior performance compared to baseline methods, achieving almost flat magnitude frequency response.

show abstract

Detection of activity and position of speakers by using deep neural networks and acoustic data augmentation

Vecchiotti

Expert Systems with Applications

Principi

et al. 2019

The task of Speaker LOCalization (SLOC) has been the focus of numerous works in the research field, where SLOC is performed on pure speech data, requiring the presence of an Oracle Voice Activity Detection (VAD) algorithm. Nevertheless, this perfect working condition is not satisfied in a real world scenario, where employed VADs do commit errors. This work addresses this issue with an extensive analysis focusing on the relationship between several datadriven VAD and SLOC models, finally proposing a reliable framework for VAD and SLOC. The effectiveness of the approach here discussed is assessed against a multi-room scenario, which is close to a real world environment. Furthermore, up to the authors' best knowledge, only one contribution proposes a unique framework for VAD and SLOC acting in this addressed scenario; however this solution does not rely on data-driven approaches. This work comes as an extension of the authors' previous research addressing the VAD and SLOC tasks, by proposing numerous advancements to the original neural network architectures. In details, four different models based on convolutional neural networks (CNNs) are here tested, in order to easily highlight the advantages of the introduced novelties. In addition, two different CNN models go under study for SLOC. Furthermore, training of data-driven models is here improved through a specific data augmentation technique. During this procedure, the room impulse responses (RIRs) of two virtual rooms are generated from the knowledge of the room size, reverberation time and microphones and sources placement. Finally, the only other framework for simultaneous detection and localization in a multi-room scenario is here taken into account to fairly compare the proposed method. As result, the proposed method shows to be more accurate than the baseline framework, and remarkable improvements are specially observed when the data

show abstract

Evolutionary tuning of filters coefficients for binaural audio equalization

et al. 2020

Deep Learning for Individual Listening Zone

et al. 2020

Road Type Classification Using Acoustic Signals: Deep Learning Models and Real-Time Implementation

Principi

et al. 2020

Gravitational Search Algorithm for IIR Filter-Based Audio Equalization

et al. 2021

Digital Filters Design for Personal Sound Zones: a Neural Approach

et al. 2022

Deep Optimization of Parametric IIR Filters for Audio Equalization

IEEE/ACM Trans. Audio Speech Lang. Process.

et al. 2022