A learning-based approach to robust binaural sound localization

Youssef, Karim; Argentieri, Sylvain; Zarader, Jean-Luc

doi:10.1109/iros.2013.6696771

Cited by 32 publications

(39 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…One solution could consist in learning the head effect in realistic conditions. Such an idea was successfully assessed in [29] through a dedicated neural network able to generalize learning to new acoustic conditions. One can also cite [30], or [31], where the iCub humanoid robot's head was endowed with two pinnae.…”

Section: Horizontal Localizationmentioning

confidence: 99%

A survey on sound source localization in robotics: From binaural to array processing methods

Argentieri

Danès

Souères

2015

Computer Speech & Language

133

View full text Add to dashboard Cite

This paper attempts to provide a state-of-the-art of sound source localization in Robotics. Noticeably, this context raises original constraints-e.g. embeddability, real time, broadband environments, noise and reverberation-which are seldom simultaneously taken into account in Acoustics or Signal Processing. A comprehensive review is proposed of recent robotics achievements, be they binaural or rooted in Array Processing techniques. The connections are highlighted with the underlying theory as well as with elements of physiology and neurology of human hearing.

show abstract

Section: Horizontal Localizationmentioning

confidence: 99%

A survey on sound source localization in robotics: From binaural to array processing methods

Argentieri

Danès

Souères

2015

Computer Speech & Language

133

View full text Add to dashboard Cite

show abstract

“…Finally, in what respect to the experimental setup, most works use simulated data either for training or for training and testing [44][45][46][47][48][49][50][51][52][54][55][56][57][58][59], usually by convolving clean (anechoic) speech with impulse responses (room, head related, or DOA related (azimuth, elevation)). Only some of them actually face real recordings [44,45,53,55,56], which in our opinion is a must to be able to assess the actual impact of the proposals in real conditions. So, in this paper we describe, for the first time in the literature to the best of our knowledge, a CNN architecture in which we directly exploit the raw acoustic signal to be provided to the neural network, with the objective of directly estimating the three dimensional position of an acoustic source in a given environment.…”

Section: State Of the Artmentioning

confidence: 99%

Towards End-to-End Acoustic Localization Using Deep Learning: From Audio Signal to Source Position Coordinates

Vera-Diaz¹,

Pizarro²,

Macías-Guarasa³

2018

Preprint

View full text Add to dashboard Cite

This paper presents a novel approach for indoor acoustic source localization using microphone arrays and based on a Convolutional Neural Network (CNN). The proposed solution is, to the best of our knowledge, the first published work in which the CNN is designed to directly estimate the three dimensional position of an acoustic source, using the raw audio signal as the input information avoiding the use of hand crafted audio features. Given the limited amount of available localization data, we propose in this paper a training strategy based on two steps. We first train our network using semi-synthetic data, generated from close talk speech recordings, and where we simulate the time delays and distortion suffered in the signal that propagates from the source to the array of microphones. We then fine tune this network using a small amount of real data. Our experimental results show that this strategy is able to produce networks that significantly improve existing localization methods based on SRP-PHAT strategies. In addition, our experiments show that our CNN method exhibits better resistance against varying gender of the speaker and different window sizes compared with the other methods.

show abstract

“…Alternatively, multidimensional features can be extracted as sets of frequency-dependent components, which require a signal frequency-dependent decomposition. This can be done through FFTs [2,[34][35][36] or filterbanks, notably of gammatone filters [4,5,37,38].…”

Section: Speaker Localizationmentioning

confidence: 99%

“…It undertakes tasks like sound processing and de-noising (speech, music, and environmental sounds), sound source localization, separation and identification. A growing field of CASA research is binaural computer audition [2][3][4][5]. Relying on signals acquired inside a human-like head and ears, it attempts to create a computational reproduction of the human auditory system stages.…”

Section: Introductionmentioning

confidence: 99%

Simultaneous Identification and Localization of Still and Mobile Speakers Based on Binaural Robot Audition

Youssef¹,

Itoyama²,

Yoshii³

2017

J. Robot. Mechatron.

Self Cite

View full text Add to dashboard Cite

[abstFig src='/00290001/06.jpg' width='300' text='Efficient mobile speaker tracking' ] This paper jointly addresses the tasks of speaker identification and localization with binaural signals. The proposed system operates in noisy and echoic environments and involves limited computations. It demonstrates that a simultaneous identification and localization operation can benefit from a common signal processing front end for feature extraction. Moreover, a joint exploitation of the identity and position estimation outputs allows the outputs to limit each other’s errors. Equivalent rectangular bandwidth frequency cepstral coefficients (ERBFCC) and interaural level differences (ILD) are extracted. These acoustic features are respectively used for speaker identity and azimuth estimation through artificial neural networks (ANNs). The system was evaluated in simulated and real environments, with still and mobile speakers. Results demonstrate its ability to produce accurate estimations in the presence of noises and reflections. Moreover, the advantage of the binaural context over the monaural context for speaker identification is shown.

show abstract

A learning-based approach to robust binaural sound localization

Cited by 32 publications

References 15 publications

A survey on sound source localization in robotics: From binaural to array processing methods

A survey on sound source localization in robotics: From binaural to array processing methods

Towards End-to-End Acoustic Localization Using Deep Learning: From Audio Signal to Source Position Coordinates

Simultaneous Identification and Localization of Still and Mobile Speakers Based on Binaural Robot Audition

Contact Info

Product

Resources

About