Proceedings of the Detection and Classification of Acoustic Scenes And Events 2019 Workshop (DCASE2019) 2019
DOI: 10.33682/4jhy-bj81
|View full text |Cite
|
Sign up to set email alerts
|

Polyphonic Sound Event Detection and Localization using a Two-Stage Strategy

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
111
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 90 publications
(121 citation statements)
references
References 22 publications
0
111
0
Order By: Relevance
“…Input features 2 Since distance from the listener is not relevant for the task, when converting to and from cartesian coordinates, we always assume the norm r = 1, that is we consider direction of arrivals as points on the unit sphere. are logscale Mel-magnitude spectrogram (logmels) and Generalized Cross-Correlation Phase Transform (GCC-PHAT) of the mutual channels, as in [12,26]. All wav-files were downsampled at a sampling rate of 32 kHz.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…Input features 2 Since distance from the listener is not relevant for the task, when converting to and from cartesian coordinates, we always assume the norm r = 1, that is we consider direction of arrivals as points on the unit sphere. are logscale Mel-magnitude spectrogram (logmels) and Generalized Cross-Correlation Phase Transform (GCC-PHAT) of the mutual channels, as in [12,26]. All wav-files were downsampled at a sampling rate of 32 kHz.…”
Section: Methodsmentioning
confidence: 99%
“…The approaches that has been adopted to solve this problem can be classified in two main categories: parametric-based methods, like multiple signal classification (MUSIC) [1] and others [2][3][4], and deep neural network (DNN)-based methods [5][6][7][8][9][10][11][12][13][14][15][16][17]. DNN-based models often combine DOA estimation with other tasks such as sound activity detection (SAD), estimation of number of active sources and sound event detection (SED) [11][12][13]. In particular, Sound Event Localization and Detection was the task 3 of Detection and Classification of Acoustic Scenes and Events 2019 Challenge (DCASE2019 Challenge) [18].…”
Section: Introductionmentioning
confidence: 99%
“…Cao et al (Cao Surrey) [19], had the second best performing system, following the first one closely. However, the authors kept the general SELDnet architecture and advanced it with a number of informed domain-specific choices.…”
Section: B Analysis Of Individual Systemsmentioning
confidence: 99%
“…Additionally, they used both FOA and MIC input and ensemble averaging. According to ablation studies in [19], the better input features and the two-stage training architecture have a drastic effect in performance.…”
Section: B Analysis Of Individual Systemsmentioning
confidence: 99%
“…Although Sound Event Detection and Localization (SEDL), as well as anomalous SED, appear to fit the criteria mentioned above, it is essential to point out that these two areas belong to different domains. SEDL is the combined task of identifying temporal activities of each sound event and the estimation of their respective spatial location trajectories when active [33][34][35].…”
Section: Introductionmentioning
confidence: 99%