Proceedings of the Detection and Classification of Acoustic Scenes And Events 2019 Workshop (DCASE2019) 2019
DOI: 10.33682/xb0q-a335
|View full text |Cite
|
Sign up to set email alerts
|

Localization, Detection and Tracking of Multiple Moving Sound Sources with a Convolutional Recurrent Neural Network

Abstract: This paper investigates the joint localization, detection, and tracking of sound events using a convolutional recurrent neural network (CRNN). We use a CRNN previously proposed for the localization and detection of stationary sources, and show that the recurrent layers enable the spatial tracking of moving sources when trained with dynamic scenes. The tracking performance of the CRNN is compared with a stand-alone tracking method that combines a multisource (DOA) estimator and a particle filter. Their respecti… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2

Citation Types

0
29
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
6
1
1

Relationship

1
7

Authors

Journals

citations
Cited by 33 publications
(38 citation statements)
references
References 26 publications
0
29
0
Order By: Relevance
“…Most of these considerations were addressed in the recent dataset for the new DCASE2020 challenge [55]. A significant advance is the introduction of reverberant moving sources, still based on captured RIRs from real spaces [55], [56]. Moreover, ambient noise occurs at varying levels, reverberant conditions are stronger and more varied, and event locations do not occur in a sparse regular grid but can vary more or less continuously.…”
Section: Discussionmentioning
confidence: 99%
“…Most of these considerations were addressed in the recent dataset for the new DCASE2020 challenge [55]. A significant advance is the introduction of reverberant moving sources, still based on captured RIRs from real spaces [55], [56]. Moreover, ambient noise occurs at varying levels, reverberant conditions are stronger and more varied, and event locations do not occur in a sparse regular grid but can vary more or less continuously.…”
Section: Discussionmentioning
confidence: 99%
“…In this context, most studies have evaluated sound localization performance in the frontal horizontal hemisphere with static sound scenarios (Grantham et al 2007;Kerber & Seeber 2012;Jones et al 2014;Dorman et al 2016). In the future, modern hearing aid and implant audio processors will apply advanced, data science-based auditory scene analysis for improved hearing outcomes (Ma et al 2017;Adavanne et al 2019).…”
Section: Discussionmentioning
confidence: 99%
“…Similar to the approaches proposed in SELDnet [31] and DOAnet [32], we extracted magnitude and phase components from spectrograms of each channel of the microphone array. The magnitude and phase components were then stacked along the channel dimension and treated as a M/2 by 2 × C image, where M is the window length of the Fourier transformation and C is the number of channels (Fig.…”
Section: A Feature Extraction 1) Audiomentioning
confidence: 99%
“…After processing the whole audio sequence with 50% overlap on window size, the shape of the audio data was T × M/2 × 2C, where T is the number of data points in the time dimension. The parameters used for our feature extraction followed those in SELDnet [31] and DOAnet [32]. 1 Our code is available at: https://intuitivecomputing.jhu.edu/openscience.html…”
Section: A Feature Extraction 1) Audiomentioning
confidence: 99%