Direction of Arrival Estimation for Multiple Sound Sources Using Convolutional Recurrent Neural Network

Adavanne, Sharath; Politis, Archontis; Virtanen, Tuomas

doi:10.23919/eusipco.2018.8553182

Cited by 203 publications

(191 citation statements)

References 21 publications

(33 reference statements)

Supporting

Mentioning

177

Contrasting

Order By: Relevance

“…For SED, we use the F-score and error rate (ER) calculated in one-second segments [8]. For DOA estimation we use two frame-wise metrics [9]: DOA error and frame recall. The DOA error is the average angular error in degrees between the predicted and reference DOAs.…”

Section: Metricsmentioning

confidence: 99%

A Multi-room Reverberant Dataset for Sound Event Localization and Detection

Adavanne¹,

Politis²,

Virtanen³

2019

Proceedings of the Detection and Classification of Acoustic Scenes And Events 2019 Workshop (DCASE2019)

Self Cite

View full text Add to dashboard Cite

This paper presents the sound event localization and detection (SELD) task setup for the DCASE 2019 challenge. The goal of the SELD task is to detect the temporal activities of a known set of sound event classes, and further localize them in space when active. As part of the challenge, a synthesized dataset with each sound event associated with a spatial coordinate represented using azimuth and elevation angles is provided. These sound events are spatialized using real-life impulse responses collected at multiple spatial coordinates in five different rooms with varying dimensions and material properties. A baseline SELD method employing a convolutional recurrent neural network is used to generate benchmark scores for this reverberant dataset. The benchmark scores are obtained using the recommended cross-validation setup.Index Terms-Sound event localization and detection, sound event detection, direction of arrival, deep neural networks

show abstract

Section: Metricsmentioning

confidence: 99%

A Multi-room Reverberant Dataset for Sound Event Localization and Detection

Adavanne¹,

Politis²,

Virtanen³

2019

Proceedings of the Detection and Classification of Acoustic Scenes And Events 2019 Workshop (DCASE2019)

Self Cite

View full text Add to dashboard Cite

show abstract

“…The first dataset is the Ambisonic, Anechoic and Synthetic Impulse Response (ANSYN) dataset [22,26], consisting of spatially located sound events in an anechoic scenario using simulated impulse responses. The dataset is divided in three subsets, O1, O2, O3, involving respectively a maximum number of 1, 2 and 3 simultaneously active sound events.…”

Section: Datasetsmentioning

confidence: 99%

“…In this paper, we want to exploit the capabilities of both QNNs and Ambisonics to analyze 3D sounds, and in particular we focus on the localization and detection of 3D sound events. Both tasks have been widely investigated recently by using convolutional neural networks (CNNs) [19][20][21][22][23][24][25]. They are also considered as a joint task in [26] for 3D sounds, but considering each microphone signal as a separate real-valued signal.…”

Section: Introductionmentioning

confidence: 99%

Quaternion Convolutional Neural Networks for Detection and Localization of 3D Sound Events

Comminiello

Lella

Scardapane

et al. 2019

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

Learning from data in the quaternion domain enables us to exploit internal dependencies of 4D signals and treating them as a single entity. One of the models that perfectly suits with quaternion-valued data processing is represented by 3D acoustic signals in their spherical harmonics decomposition. In this paper, we address the problem of localizing and detecting sound events in the spatial sound field by using quaternion-valued data processing. In particular, we consider the spherical harmonic components of the signals captured by a first-order ambisonic microphone and process them by using a quaternion convolutional neural network. Experimental results show that the proposed approach exploits the correlated nature of the ambisonic signals, thus improving accuracy results in 3D sound event detection and localization.

show abstract

“…[15] applied a larger neural network with three hidden layers to estimate the DOA of a single source and showed a performance advantage with respect to DOA estimation with monopulse. Other more recent works [16]- [19] have applied deep neural networks (DNNs) for estimating acoustic sources direction/position from a large number of realizations of microphones array. It was shown that neural networks attain relatively good accuracy compared to MUSIC [8] in challenging acoustic room environment conditions, such as reverberations and high noise.…”

Section: Introductionmentioning

confidence: 99%

Performance Advantages of Deep Neural Networks for Angle of Arrival Estimation

Bialer

Garnett

Tirer

2019

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

The problem of estimating the number of sources and their angles of arrival from a single antenna array observation has been an active area of research in the signal processing community for the last few decades. When the number of sources is large, the maximum likelihood estimator is intractable due to its very high complexity, and therefore alternative signal processing methods have been developed with some performance loss. In this paper, we apply a deep neural network (DNN) approach to the problem and analyze its advantages with respect to signal processing algorithms. We show that an appropriate designed network can attain the maximum likelihood performance with feasible complexity and outperform other feasible signal processing estimation methods over various signal to noise ratios and array response inaccuracies.Index Terms-Angle of arrival, deep neural networks, model order determination, single snapshot

show abstract

Direction of Arrival Estimation for Multiple Sound Sources Using Convolutional Recurrent Neural Network

Cited by 203 publications

References 21 publications

A Multi-room Reverberant Dataset for Sound Event Localization and Detection

A Multi-room Reverberant Dataset for Sound Event Localization and Detection

Quaternion Convolutional Neural Networks for Detection and Localization of 3D Sound Events

Performance Advantages of Deep Neural Networks for Angle of Arrival Estimation

Contact Info

Product

Resources

About