2018 26th European Signal Processing Conference (EUSIPCO) 2018
DOI: 10.23919/eusipco.2018.8553182
|View full text |Cite
|
Sign up to set email alerts
|

Direction of Arrival Estimation for Multiple Sound Sources Using Convolutional Recurrent Neural Network

Abstract: This paper proposes a deep neural network for estimating the directions of arrival (DOA) of multiple sound sources. The proposed stacked convolutional and recurrent neural network (DOAnet) generates a spatial pseudo-spectrum (SPS) along with the DOA estimates in both azimuth and elevation. We avoid any explicit feature extraction step by using the magnitudes and phases of the spectrograms of all the channels as input to the network. The proposed DOAnet is evaluated by estimating the DOAs of multiple concurrent… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
177
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
4
1

Relationship

1
8

Authors

Journals

citations
Cited by 203 publications
(191 citation statements)
references
References 21 publications
(33 reference statements)
0
177
0
Order By: Relevance
“…For SED, we use the F-score and error rate (ER) calculated in one-second segments [8]. For DOA estimation we use two frame-wise metrics [9]: DOA error and frame recall. The DOA error is the average angular error in degrees between the predicted and reference DOAs.…”
Section: Metricsmentioning
confidence: 99%
“…For SED, we use the F-score and error rate (ER) calculated in one-second segments [8]. For DOA estimation we use two frame-wise metrics [9]: DOA error and frame recall. The DOA error is the average angular error in degrees between the predicted and reference DOAs.…”
Section: Metricsmentioning
confidence: 99%
“…The first dataset is the Ambisonic, Anechoic and Synthetic Impulse Response (ANSYN) dataset [22,26], consisting of spatially located sound events in an anechoic scenario using simulated impulse responses. The dataset is divided in three subsets, O1, O2, O3, involving respectively a maximum number of 1, 2 and 3 simultaneously active sound events.…”
Section: Datasetsmentioning
confidence: 99%
“…In this paper, we want to exploit the capabilities of both QNNs and Ambisonics to analyze 3D sounds, and in particular we focus on the localization and detection of 3D sound events. Both tasks have been widely investigated recently by using convolutional neural networks (CNNs) [19][20][21][22][23][24][25]. They are also considered as a joint task in [26] for 3D sounds, but considering each microphone signal as a separate real-valued signal.…”
Section: Introductionmentioning
confidence: 99%
“…[15] applied a larger neural network with three hidden layers to estimate the DOA of a single source and showed a performance advantage with respect to DOA estimation with monopulse. Other more recent works [16]- [19] have applied deep neural networks (DNNs) for estimating acoustic sources direction/position from a large number of realizations of microphones array. It was shown that neural networks attain relatively good accuracy compared to MUSIC [8] in challenging acoustic room environment conditions, such as reverberations and high noise.…”
Section: Introductionmentioning
confidence: 99%