2019
DOI: 10.1109/jstsp.2019.2901664
|View full text |Cite
|
Sign up to set email alerts
|

Multi-Speaker DOA Estimation Using Deep Convolutional Networks Trained With Noise Signals

Abstract: Supervised learning based methods for source localization, being data driven, can be adapted to different acoustic conditions via training and have been shown to be robust to adverse acoustic environments. In this paper, a convolutional neural network (CNN) based supervised learning method for estimating the direction-of-arrival (DOA) of multiple speakers is proposed. Multi-speaker DOA estimation is formulated as a multi-class multi-label classification problem, where the assignment of each DOA label to the in… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
241
0
2

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
3
2

Relationship

1
8

Authors

Journals

citations
Cited by 266 publications
(244 citation statements)
references
References 29 publications
1
241
0
2
Order By: Relevance
“…Commonly used input features that have been used for deep learning based localization include phase spectrum [115], magnitude spectrum [118], and generalized cross-correlation between channels [117]. In general, source localization requires the use of interchannel information, which can also be learned by a deep neural network with a suitable topology from within-channel features, for example by convolutional layers [118] where the kernels span multiple channels.…”
Section: Applicationsmentioning
confidence: 99%
“…Commonly used input features that have been used for deep learning based localization include phase spectrum [115], magnitude spectrum [118], and generalized cross-correlation between channels [117]. In general, source localization requires the use of interchannel information, which can also be learned by a deep neural network with a suitable topology from within-channel features, for example by convolutional layers [118] where the kernels span multiple channels.…”
Section: Applicationsmentioning
confidence: 99%
“…The performance of the proposed algorithm is compared with a recent CNN-based DOA estimation method proposed in [54] (subsequently denoted as "CNN-PH") where it was already shown that "CNN-PH" outperforms conventional parametric methods like MUSIC and SRP-PHAT. For a fair comparison, we kept the CNN architecture and other evaluation criteria same in all possible ways.…”
Section: B Baseline Methods and Evaluation Metricsmentioning
confidence: 99%
“…On the contrary, Adavanne et al considered both magnitude and phase information of the STFT coefficients and used consecutive time frames to form the feature snapshot to train a convolutional recurrent neural network (CRNN) and performed a joint sound event detection and localization [55]. Both [54] and [55] require the model to be trained for unique combinations of sound sources from different angles in order to accurately estimate the DOA of simultaneously active multiple sound sources.…”
Section: A Literature Reviewmentioning
confidence: 99%
“…Binaural cues are employed in [7], where the cross-correlation function (CCF) was used as features in a DNN to estimate the azimuth of a sound source with simulated head movement. CNN architectures were also used in [8,9] using frequency-domain features such as the phase or the magnitude of the signal.…”
Section: Introductionmentioning
confidence: 99%