UGMENTED OR MIXED REALITY (AR/MR) is emerging as one of the key technologies in the future of computing. Audio cues are critical for maintaining a high degree of realism, social connection, and spatial awareness for various AR/MR applications such as teaching and training, gaming, remote work, education, and virtual social gatherings. Motivated by a wide variety of AR/MR listening experiences delivered over hearables, this feature article systematically reviews the integration of fundamental and advanced signal processing techniques for AR/MR audio to equip the researchers and engineers in the signal processing community for the next wave of AR/MR.
I. Introduction: AR/MR audio experience over hearablesAlternate reality technologies aim to provide a feeling of presence to humans through engaging multi-modal content.We define AR/MR as an alternate reality experience achieved by the seamless fusion of the reproduced virtual content with the real-world stimuli that can be modified as desired. This definition expands on AR's previous usage, where the experiences provided were limited to the overlay of virtual content in the real world. It can also include other alternate reality technologies, such as virtual reality (VR), where users can get transported to a virtual environment.The immersive AR/MR technologies have demonstrated substantial benefits in various applications such as education and training, tourism, and remote working. We have seen significant progress in rendering AR/MR devices' different modalities in the past few decades. In particular, the new lifestyle of reduced physical connection triggered by the COVID-19 pandemic has made such needs even more critical than ever before. This paper focuses on sound (or audio), an inherent part of our everyday lives for communication, social interactions, and situational awareness. Unlike vision, where the field of view is limited, natural listening always spans a 360°range.The pervasive spatial perception of sound can be critical in a wide variety of high-stress situations, such as warnings for approaching vehicles or public announcements in case of emergencies where the visual cues may not be enough.Even in day-to-day cases, such as conversations among a group of people, we rely on audio cues to direct our attention towards a particular speaker, referred to as the cocktail party effect.Audio devices today can be broadly classified into two major categories: speakers and headphones. Speakers are widely used to playback audio content for multimedia applications, where a fixed configuration of speakers can provide the