Direction of arrival (DOA) estimation for speech sources is an important task in audio signal processing. This task becomes a challenge in reverberant environments, which are typical to real scenarios. Several methods of DOA estimation for speech sources have been developed recently, in an attempt to overcome the effect of reverberation. One effective approach aims to identify time-frequency bins in the short time Fourier transform domain that are dominated by the direct sound. This approach was shown to be particularly adequate for spherical arrays, with processing in the spherical harmonics domain. The direct-path dominance (DPD) test, and a method which is based on the directivity of the sound field are recent examples. While these methods seem to perform well, high reverberation conditions may degrade their performance. In this paper, the structure of the spatial correlation matrix is comprehensively studied, showing that under some well-defined conditions, the DOA of the direct sound can be correctly extracted from its dominant eigenvector, even when contaminated by reflections. This new insight leads to the development of a new test, performing an enhanced decomposition of the direct sound (EDS), denoted the DPD-EDS test. The proposed test is compared to previous DPD tests, and to other recently proposed reverberation-robust methods, using computer simulations and an experimental study, demonstrating its potential advantage. The studies include multiple speakers in highly reverberant environments, therefore representing challenging real-life acoustics scenes.
The perception of sound in real-life acoustic environments, such as enclosed rooms or open spaces with reflective objects, is affected by reverberation. Hence, reverberation is extensively studied in the context of auditory perception, with many studies highlighting the importance of the direct sound for perception. Based on this insight, speech processing methods often use time-frequency (TF) analysis to detect TF bins that are dominated by the direct sound, and then use the detected bins to reproduce or enhance the speech signals. The detection of bins dominated by the direct sound is typically based on an objective measure, such as the direct-to-reverberant ratio (DRR). However, the relation between the DRR in the TF bins and the spatial perception of the reverberant sound which is reproduced from these bins is still not clear. It is the aim of this paper to provide some insights into this relation, specifically for reverberant speech, focusing on bins with high DRR. This is performed using a listening experiment, where high DRR bins within a reverberant speech signal have been masked in the TF domain, based on various DRR thresholds. The results show that the percentage of high-DRR TF bins that were masked may better indicate the quality of spatial perception, compared to the specific value of the DRR threshold. The insights from this work could be incorporated into spatial audio techniques that reproduce the direct sound of reverberant speech, and potentially improve spatial perception. This was illustrated with an implementation of directional audio coding that was studied with an additional listening experiment supporting the previously described results.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.