2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2018
DOI: 10.1109/icassp.2018.8461975
|View full text |Cite
|
Sign up to set email alerts
|

Large-Scale Weakly Supervised Audio Classification Using Gated Convolutional Neural Network

Abstract: In this paper, we present a gated convolutional neural network and a temporal attention-based localization method for audio classification, which won the 1st place in the large-scale weakly supervised sound event detection task of Detection and Classification of Acoustic Scenes and Events (DCASE) 2017 challenge. The audio clips in this task, which are extracted from YouTube videos, are manually labelled with one or a few audio tags but without time stamps of the audio events, which is called as weakly labelled… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
184
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
5
3
1
1

Relationship

1
9

Authors

Journals

citations
Cited by 181 publications
(191 citation statements)
references
References 13 publications
(13 reference statements)
1
184
0
Order By: Relevance
“…Weakly supervised method is also a common kind of algorithm for AED tasks [27], [28], [29]. Usually, it is timeconsuming and laborious to accurately annotate the onset and offset for one acoustic event.…”
Section: B Weakly Supervised Event Detectionmentioning
confidence: 99%
“…Weakly supervised method is also a common kind of algorithm for AED tasks [27], [28], [29]. Usually, it is timeconsuming and laborious to accurately annotate the onset and offset for one acoustic event.…”
Section: B Weakly Supervised Event Detectionmentioning
confidence: 99%
“…Most recent advances in polyphonic SED are largely attributed to the use of Machine Learning and Deep Neural Networks [8,9,10,11,12,13]. In particular, the use of Convolutional Recurrent Neural Networks (CRNNs) has significantly improved SED performance in the past few years [14,15,16,17]. However, there are three main disadvantages with current CRNN-based polyphonic SED approaches.…”
Section: Related Workmentioning
confidence: 99%
“…These micro-averaged F1 scores can most directly be compared to the outcomes reported between parentheses in Table 1, as their computation is based on the same data [12]. Table 2: F1 scores of prior audio classification models model F1 score fusion of gated convolutional recurrent networks [21] 55.6% capsule-based gated convolutional network [9] 58.6%…”
Section: 13mentioning
confidence: 99%