For the realization of auditory augmented reality (AAR), it is important that the room acoustical properties of the virtual elements are perceived in agreement with the acoustics of the actual environment. This perceptual matching of room acoustics is the subject reviewed in this paper. Realizations of AAR that fulfill the listeners’ expectations were achieved based on pre-characterization of the room acoustics, for example, by measuring acoustic impulse responses or creating detailed room models for acoustic simulations. For future applications, the goal is to realize an online adaptation in (close to) real-time. Perfect physical matching is hard to achieve with these practical constraints. For this reason, an understanding of the essential psychoacoustic cues is of interest and will help to explore options for simplifications. This paper reviews a broad selection of previous studies and derives a theoretical framework to examine possibilities for psychoacoustical optimization of room acoustical matching.
For a spatial audio reproduction in the context of augmented reality, a position-dynamic binaural synthesis system can be used to synthesize the ear signals for a moving listener. The goal is the fusion of the auditory perception of the virtual audio objects with the real listening environment. Such a system has several components, each of which help to enable a plausible auditory simulation. For each possible position of the listener in the room, a set of binaural room impulse responses (BRIRs) congruent with the expected auditory environment is required to avoid room divergence effects. Adequate and efficient approaches are methods to synthesize new BRIRs using very few measurements of the listening room. The required spatial resolution of the BRIR positions can be estimated by spatial auditory perception thresholds. Retrieving and processing the tracking data of the listener’s head-pose and position as well as convolving BRIRs with an audio signal needs to be done in real-time. This contribution presents work done by the authors including several technical components of such a system in detail. It shows how the single components are affected by psychoacoustics. Furthermore, the paper also discusses the perceptive effect by means of listening tests demonstrating the appropriateness of the approaches.
UGMENTED OR MIXED REALITY (AR/MR) is emerging as one of the key technologies in the future of computing. Audio cues are critical for maintaining a high degree of realism, social connection, and spatial awareness for various AR/MR applications such as teaching and training, gaming, remote work, education, and virtual social gatherings. Motivated by a wide variety of AR/MR listening experiences delivered over hearables, this feature article systematically reviews the integration of fundamental and advanced signal processing techniques for AR/MR audio to equip the researchers and engineers in the signal processing community for the next wave of AR/MR. I. Introduction: AR/MR audio experience over hearablesAlternate reality technologies aim to provide a feeling of presence to humans through engaging multi-modal content.We define AR/MR as an alternate reality experience achieved by the seamless fusion of the reproduced virtual content with the real-world stimuli that can be modified as desired. This definition expands on AR's previous usage, where the experiences provided were limited to the overlay of virtual content in the real world. It can also include other alternate reality technologies, such as virtual reality (VR), where users can get transported to a virtual environment.The immersive AR/MR technologies have demonstrated substantial benefits in various applications such as education and training, tourism, and remote working. We have seen significant progress in rendering AR/MR devices' different modalities in the past few decades. In particular, the new lifestyle of reduced physical connection triggered by the COVID-19 pandemic has made such needs even more critical than ever before. This paper focuses on sound (or audio), an inherent part of our everyday lives for communication, social interactions, and situational awareness. Unlike vision, where the field of view is limited, natural listening always spans a 360°range.The pervasive spatial perception of sound can be critical in a wide variety of high-stress situations, such as warnings for approaching vehicles or public announcements in case of emergencies where the visual cues may not be enough.Even in day-to-day cases, such as conversations among a group of people, we rely on audio cues to direct our attention towards a particular speaker, referred to as the cocktail party effect.Audio devices today can be broadly classified into two major categories: speakers and headphones. Speakers are widely used to playback audio content for multimedia applications, where a fixed configuration of speakers can provide the
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.