Binaural source localization using deep learning and head rotation information

Garcia-Barrios, Guillermo; Krause, Daniel; Politis, Archontis; Mesaros, Annamaria; Gutiérrez-Arriola, Juana M.; Fraile, Rubén

doi:10.23919/eusipco55093.2022.9909764

Cited by 4 publications

(8 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To the authors' best knowledge, no study has been done on regression-based binaural DOA estimation that would be tested for a large number of rooms and a full range of azimuth angles. We investigated binaural localization in a scenario with a rotating head, showing that head rotation information significantly improves the estimation precision [44]. This work is a direct continuation of that study.…”

Section: A Sound Source Localizationmentioning

confidence: 81%

“…Although they demonstrated the benefit of using motion-based cues in binaural DOAE systems, the investigation was limited to azimuth rotation angles comprised in the range of ±90º. In [44], we first proposed a DNN system that takes advantage of head movement information and explicitly estimates the DOA for an unlimited range of azimuth and elevation angles. In this study, we extend this approach to scenarios with a moving listener.…”

Section: Binaural Doae With Head Rotation or A Moving Listenermentioning

confidence: 99%

“…Utilizing the sine and cosine values of phase differences produces a smoother representation compared with raw values and avoids phase wrapping. These features have been firstly proposed for multichannel DNN-based speech separation [68] and further investigated for localization in [44], [69]. On top of that, we utilize the ILDs, which constitute another major binaural cue that becomes important above 1.5 kHz due to the diminishing effect of IPDs related with the physical distance between the ears [70].…”

Section: A Input Feature Extractionmentioning

confidence: 99%

“…The basic DNN architecture, depicted in Figure 1, is based on a model from our previous study [44], which was proven to perform efficiently for an DOAE system utilizing information about head rotation. Here, we introduce a few changes to tackle the new problems under investigation.…”

Section: B Model Architecturementioning

confidence: 99%

See 3 more Smart Citations

Binaural Sound Source Distance Estimation and Localization for a Moving Listener

Krause,

García-Barrios,

Politis

et al. 2024

IEEE/ACM Trans. Audio Speech Lang. Process.

Self Cite

View full text Add to dashboard Cite

In this paper, we investigate the tasks of binaural source distance estimation (SDE) and direction-of-arrival estimation (DOAE) using motion-based cues in a scenario with a walking listener. On top of performing both tasks as separate problems, we study two methods of solving the joint task of simultaneous source distance estimation and localization (SDEL), with a single model. Experiments are conducted for three different scenarios: a static receiver; a static receiver with a rotating head; and a freely moving listener inside a room. The study proposes rotation and translation features to include information about the receiver's motion during model training and studies the effects of these on the final performance. The work includes extended simulation of three datasets containing numerous testing scenarios for sound sources, covering a wide range of DOAs and a source-to-receiver distance up to 15 m. Results are further analyzed with respect to room reverberation, walking speed, as well as source-to-receiver distance. The presented outcomes show large improvements in both DOA and distance estimation for a model that uses motion-based cues as compared with a static scenario. These include a decrease of 9.50°in DOA and 1.56m in distance errors for a joint model, followed by 16.17°and 0.17m for separate models.

show abstract

Section: A Sound Source Localizationmentioning

confidence: 81%

Section: Binaural Doae With Head Rotation or A Moving Listenermentioning

confidence: 99%

Section: A Input Feature Extractionmentioning

confidence: 99%

Section: B Model Architecturementioning

confidence: 99%

See 2 more Smart Citations

Binaural Sound Source Distance Estimation and Localization for a Moving Listener

Krause,

García-Barrios,

Politis

et al. 2024

IEEE/ACM Trans. Audio Speech Lang. Process.

Self Cite

View full text Add to dashboard Cite

show abstract

“…The dataset used for experiments follows the same setup as in [37]. Briefly, anechoic speech recordings obtained from the TIMIT dataset [38] are convolved with the simulated omnidirectional RIRs from an image-source room simulator for shoebox geometries [39].…”

Section: A Synthetic Datasetmentioning

confidence: 99%

Speaker Distance Estimation in Enclosures From Single-Channel Audio

Neri,

Politis,

Krause

et al. 2024

IEEE/ACM Trans. Audio Speech Lang. Process.

Self Cite

View full text Add to dashboard Cite

Distance estimation from audio plays a crucial role in various applications, such as acoustic scene analysis, sound source localization, and room modeling. Most studies predominantly center on employing a classification approach, where distances are discretized into distinct categories, enabling smoother model training and achieving higher accuracy but imposing restrictions on the precision of the obtained sound source position. Towards this direction, in this paper we propose a novel approach for continuous distance estimation from audio signals using a convolutional recurrent neural network with an attention module. The attention mechanism enables the model to focus on relevant temporal and spectral features, enhancing its ability to capture fine-grained distance-related information. To evaluate the effectiveness of our proposed method, we conduct extensive experiments using audio recordings in controlled environments with three levels of realism (synthetic room impulse response, measured response with convolved speech, and real recordings) on four datasets (our synthetic dataset, QMULTI-MIT, VoiceHome-2, and STARSS23). Experimental results show that the model achieves an absolute error of 0.11 meters in a noiseless synthetic scenario. Moreover, the results showed an absolute error of about 1.30 meters in the hybrid scenario. The algorithm's performance in the real scenario, where unpredictable environmental factors and noise are prevalent, yields an absolute error of approximately 0.50 meters. For reproducible research purposes we make model, code, and synthetic datasets available at https://github.com/michaelneri/audio-distance-estimation.

show abstract