The synthesis of binaural signals from spherical microphone array recordings has been recently proposed. The limited spatial resolution of the reproduced signal due to order-limited reproduction has been previously investigated perceptually, showing spatial perception ramifications, such as poor source localization and limited externalization. Furthermore, this spatial order limitation also has a detrimental effect on the frequency content of the signal and its perceived timbre, due to the rapid roll-off at high frequencies. In this paper, the underlying causes of this spectral roll-off are described mathematically and investigated numerically. A digital filter that equalizes the frequency spectrum of a low spatial order signal is introduced and evaluated. A comprehensive listening test was conducted to study the influence of the filter on the perception of the reproduced sound. Results indicate that the suggested filter is beneficial for restoring the timbral composition of order-truncated binaural signals, while conserving, and even improving, some spatial properties of the signal.
The perceptual evaluation of spatial audio systems may be based on singular auditory qualities such as the localization accuracy or the perception of coloration, on overall criteria of perceptual accuracy such as plausibility and authenticity or on detailed catalogues of auditory qualities. However, only the latter will be suited for the perceptual characterization of a simulation's technical shortcomings and allow for its focused improvement. Therefore, a common vocabulary containing all perceptual attributes which are relevant in this context appears desirable. Existing vocabularies for the evaluation of sound field synthesis, spatialization technologies and virtual acoustic environments were often generated ad hoc by authors or were focused on specific perceptual aspects or on specific spatialization techniques only. To overcome limitations with respect to the relevance and completeness of these vocabularies we have developed a Spatial Audio Quality Inventory (SAQI) for the perceptual evaluation of all spatial audio technologies used for the (re)synthesis of acoustic environments. It is a consensus vocabulary comprising 48 verbal descriptors of auditive qualities assumed to be of practical relevance when comparing (re)synthesized sound fields to real or imagined references or amongst each other. The vocabulary was generated by a Focus Group of 21 German speaking experts for virtual acoustics. Five additional experts helped verifying the unambiguity of all descriptors and the related explanations. Moreover, an English translation was generated and verified by eight bilingual experts. This article describes the applied methodology and presents the English version of the final vocabulary.
A round robin was conducted to evaluate the state of the art of room acoustic modeling software both in the physical and perceptual realms. The test was based on six acoustic scenes highlighting specific acoustic phenomena and for three complex, “real-world” spatial environments. The results demonstrate that most present simulation algorithms generate obvious model errors once the assumptions of geometrical acoustics are no longer met. As a consequence, they are neither able to provide a reliable pattern of early reflections nor do they provide a reliable prediction of room acoustic parameters outside a medium frequency range. In the perceptual domain, the algorithms under test could generate mostly plausible but not authentic auralizations, i.e., the difference between simulated and measured impulse responses of the same scene was always clearly audible. Most relevant for this perceptual difference are deviations in tone color and source position between measurement and simulation, which to a large extent can be traced back to the simplified use of random incidence absorption and scattering coefficients and shortcomings in the simulation of early reflections due to the missing or insufficient modeling of diffraction.
2019). "A Cross-validated database of measured and simulated HRTFs including 3D head meshes and anthropometric features." J. Audio Eng. Soc.
Head-related transfer functions (HRTFs) were acoustically measured and numerically simulated for the FABIAN head and torso simulator on a full-spherical and high resolution sampling grid. Moreover, HRTFs were acquired for 11 horizontal head-above-torso orientations, covering the typical range of motion of ±50• , making it possible to account for head movements of the listeners in dynamic binaural auralizations in a physically correct manner. In lack of an external reference for HRTFs, measured and simulated data sets were cross-validated by applying auditory models for localization performance and spectral coloration and by correlation analyses. The results indicate a high degree of similarity between the two data sets regarding all tested aspects, thus suggesting that they are free of systematic errors. The HRTF database is publicly available from https://doi.org/10.14279/depositonce-5718.2 and is accompanied by a wide range of headphone filters for use in binaural synthesis.
A simulation that is perceptually indistinguishable from the corresponding real sound field could be termed authentic. Using binaural technology, such a simulation would theoretically be achieved by reconstructing the sound pressure at a listener's ears. However, inevitable errors in the measurement, rendering, and reproduction introduce audible degradations, as it has been demonstrated in previous studies for anechoic environments and static binaural simulations (fixed head orientation). The current study investigated the authenticity of individual dynamic binaural simulations for three different acoustic environments (anechoic, dry, wet) using a highly sensitive listening test design. The results show that about half of the participants failed to reliably detect any differences for a speech stimulus, whereas all participants were able to do so for pulsed pink noise. Higher detection rates were observed in the anechoic condition, compared to the reverberant spaces, while the source position had no significant effect. It is concluded that the authenticity mainly depends on how comprehensive the spectral cues are provided by the audio content, and the amount of reverberation, whereas the source position plays a minor role. This is confirmed by a broad qualitative evaluation, suggesting that remaining differences mainly affect the tone color rather than the spatial, temporal or dynamical qualities.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.