2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) 2019
DOI: 10.1109/waspaa.2019.8937193
|View full text |Cite
|
Sign up to set email alerts
|

Polyphonic Sound Event and Sound Activity Detection: A Multi-Task Approach

Abstract: Polyphonic Sound Event Detection (SED) in real-world recordings is a challenging task because of the dynamic polyphony level, intensity, and duration of sound events. Current polyphonic SED systems fail to model the temporal structure of sound events explicitly and instead attempt to look at which sound events are present at each audio frame. Consequently, the event-wise detection performance is much lower than the segment-wise detection performance. In this work, we propose a joint model approach to improve t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
15
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 15 publications
(17 citation statements)
references
References 23 publications
(27 reference statements)
0
15
0
Order By: Relevance
“…We have compared the SED performance of our methods with those of other conventional methods, such as SED using a α min-max sub- sampling method within CRNN [18], the batch dice loss-based SED [23,27], multitask learning of SED and sound activity detection [28], and Transformer-based SED [12,13]. As the Transformerbased SED, we used three CNN layers with the same structure as the CNN-BiGRU, followed by two Transformer encoder layers and two dense layers.…”
Section: Comparison With Conventional Methodsmentioning
confidence: 99%
“…We have compared the SED performance of our methods with those of other conventional methods, such as SED using a α min-max sub- sampling method within CRNN [18], the batch dice loss-based SED [23,27], multitask learning of SED and sound activity detection [28], and Transformer-based SED [12,13]. As the Transformerbased SED, we used three CNN layers with the same structure as the CNN-BiGRU, followed by two Transformer encoder layers and two dense layers.…”
Section: Comparison With Conventional Methodsmentioning
confidence: 99%
“…To further demonstrate the utility of the proposed method, we applied our method to the conventional method for SED [27], which is based on the multitask learning of SED and SAD. The experimental results show that the event detection performance of SED + SAD [27] is better than that of CRNN (event). Moreover, the F-score of event detection in proposed + SAD (β= 0.01) was improved by 1.4 percentage points compared with that of SED + SAD [27].…”
Section: Overall Performances Of Sed and Ascmentioning
confidence: 99%
“…As the baseline model of SED, we used the convolutional neural network and bidirectional gated recurrent unit (CNN-BiGRU) [8]. Moreover, to verify the usefulness of the proposed method, we used a model combining SED and sound activity detection (SAD) based on multitask learning (MTL), referred to as "MTL of SED & SAD" [25], and a model combining SED and ASC, referred to as "MTL of SED & ASC" [13]. The sound activity detection is the mechanism of recognizing any active events in a time frame.…”
Section: Experimental Conditionsmentioning
confidence: 99%