Exploiting Deep Neural Networks and Head Movements for Robust Binaural Localization of Multiple Sources in Reverberant Environments

Ma, Ning; May, Tim; Brown, Guy J.

doi:10.1109/taslp.2017.2750760

Cited by 128 publications

(154 citation statements)

References 25 publications

(51 reference statements)

Supporting

Mentioning

152

Contrasting

Order By: Relevance

“…To improve the robustness of DOA estimation, deep neural networks (DNNs) have been proposed to learn a mapping between signal features and a discretized DOA space [17][18][19][20][21]. Various features such as phasemaps [17,18] and GCC-PHAT [21] have been used as inputs.…”

Section: Introductionmentioning

confidence: 99%

Keyword Based Speaker Localization: Localizing a Target Speaker in a Multi-speaker Environment

Sivasankaran¹,

Fohr²

2018

Interspeech 2018

View full text Add to dashboard Cite

To cite this version:Sunit Sivasankaran, Emmanuel Vincent, Dominique Fohr. Keyword-based speaker localization: Localizing a target speaker in a multi-speaker environment. Interspeech 2018 -19th AbstractSpeaker localization is a hard task, especially in adverse environmental conditions involving reverberation and noise. In this work we introduce the new task of localizing the speaker who uttered a given keyword, e.g., the wake-up word of a distantmicrophone voice command system, in the presence of overlapping speech. We employ a convolutional neural network based localization system and investigate multiple identifiers as additional inputs to the system in order to characterize this speaker.We conduct experiments using ground truth identifiers which are obtained assuming the availability of clean speech and also in realistic conditions where the identifiers are computed from the corrupted speech. We find that the identifier consisting of the ground truth time-frequency mask corresponding to the target speaker provides the best localization performance and we propose methods to estimate such a mask in adverse reverberant and noisy conditions using the considered keyword.

show abstract

Section: Introductionmentioning

confidence: 99%

Keyword Based Speaker Localization: Localizing a Target Speaker in a Multi-speaker Environment

Sivasankaran¹,

Fohr²

2018

Interspeech 2018

View full text Add to dashboard Cite

show abstract

“…Due to the nature of the human auditory system, machine-hearing approaches are often implemented in binaural localisation algorithms, typically using either Gaussian mixture models (GMMs) [9][10][11] or neural networks (NNs) [12][13][14][15]. In most cases, the data presented to the machine-hearing algorithm fit into one of two categories: binaural cues (ITD and ILD) or spectral cues.…”

Section: Introductionmentioning

confidence: 99%

“…In most cases, the data presented to the machine-hearing algorithm fit into one of two categories: binaural cues (ITD and ILD) or spectral cues. Previous machine-hearing approaches to binaural localisation have shown good results across the training data and, in some cases, good generalisability across unknown data from different datasets [9][10][11][12][13][14][15].…”

Section: Introductionmentioning

confidence: 99%

“…Recent work by Ma et al [15] compared the use of GMM and deep NNs (DNNs) for the azimuthal DoA estimation task. The DNN made use of head rotation produced by a KEMAR unit (KEMAR: Knowles Electronics Manikin for Acoustic Research) is a head and torso simulator designed specifically for, and commonly used in, binaural acoustic research) [17] fitted with a motorised head.…”

Section: Introductionmentioning

confidence: 99%

“…This paper presents a novel approach for the spatial analysis of two-channel BRIRs, using a binaural model fronted NN to estimate the azimuthal direction of arrival for the direct sound and reflected components (direct sound is used to refer to the signal emitted by a loudspeaker arriving at the receiver, and the reflected component refers to a reflected copy of the emitted signal arriving at the receiver after incidence with a reflective surface) of the BRIRs. It develops and extends the approach adopted in [15] in terms of the processing used by the binaural model to extract the interaural cues, the use of a cascade-correlation neural network as opposed to the multi-layer perceptron to map the binaural cues to the direction of arrival classes, the nature of the sound components being analysed-short pulses relating to the direct sound and reflected components of a BRIR as opposed to continuous speech signals-and the method by which measurement orientations are implemented and analysed by the NN. In this paper, multiple measurement orientations are presented simultaneously to the NN, whereas in [15], multiple orientations are presented as rotations produced by a motorised head with the signals being analysed separately by the NN, which allowed for active sound source localisation in an environment.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Application of Machine Learning for the Spatial Analysis of Binaural Room Impulse Responses

Lovedee-Turner

Murphy

2018

Applied Sciences

View full text Add to dashboard Cite

Abstract:Spatial impulse response analysis techniques are commonly used in the field of acoustics, as they help to characterise the interaction of sound with an enclosed environment. This paper presents a novel approach for spatial analyses of binaural impulse responses, using a binaural model fronted neural network. The proposed method uses binaural cues utilised by the human auditory system, which are mapped by the neural network to the azimuth direction of arrival classes. A cascade-correlation neural network was trained using a multi-conditional training dataset of head-related impulse responses with added noise. The neural network is tested using a set of binaural impulse responses captured using two dummy head microphones in an anechoic chamber, with a reflective boundary positioned to produce a reflection with a known direction of arrival. Results showed that the neural network was generalisable for the direct sound of the binaural room impulse responses for both dummy head microphones. However, it was found to be less accurate at predicting the direction of arrival of the reflections. The work indicates the potential of using such an algorithm for the spatial analysis of binaural impulse responses, while indicating where the method applied needs to be made more robust for more general application.

show abstract

Sound Localization in Mammals and Models

Braasch

2022

Encyclopedia of Computational Neuroscience

View full text Add to dashboard Cite

Exploiting Deep Neural Networks and Head Movements for Robust Binaural Localization of Multiple Sources in Reverberant Environments

Cited by 128 publications

References 25 publications

Keyword Based Speaker Localization: Localizing a Target Speaker in a Multi-speaker Environment

Keyword Based Speaker Localization: Localizing a Target Speaker in a Multi-speaker Environment

Application of Machine Learning for the Spatial Analysis of Binaural Room Impulse Responses

Sound Localization in Mammals and Models

Contact Info

Product

Resources

About