2015 International Joint Conference on Neural Networks (IJCNN) 2015
DOI: 10.1109/ijcnn.2015.7280624
|View full text |Cite
|
Sign up to set email alerts
|

Polyphonic sound event detection using multi label deep neural networks

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
147
0

Year Published

2016
2016
2020
2020

Publication Types

Select...
5
4

Relationship

1
8

Authors

Journals

citations
Cited by 187 publications
(147 citation statements)
references
References 14 publications
0
147
0
Order By: Relevance
“…These features has been shown to perform well in various audio tagging and sound event detection tasks [12,13,9]. First, we obtained the magnitude spectrum of the audio signals by using short-time Fourier transform (STFT) over 40 ms audio frames of 50% overlap, windowed with Hamming window.…”
Section: Featuresmentioning
confidence: 99%
“…These features has been shown to perform well in various audio tagging and sound event detection tasks [12,13,9]. First, we obtained the magnitude spectrum of the audio signals by using short-time Fourier transform (STFT) over 40 ms audio frames of 50% overlap, windowed with Hamming window.…”
Section: Featuresmentioning
confidence: 99%
“…Regarding the more challenging overlapped scenario, different approaches include temporally-constrained probabilistic analysis models [5], generalized Hough-transform based systems [6], HMM-based systems with multiple-path Viterbi decoding [7], non-negative matrix factorization [8], and multi-label deep neural networks. In particular, the latter have shown good performance by modeling overlapping events in a natural way [9,10].…”
Section: Introductionmentioning
confidence: 99%
“…In this paper, the acoustic events are defined as active acoustic events when they show their presence within the frames under consideration. According to [21], the accuracy is high for high threshold values in the low polyphony levels, where the polyphony level reflects the number of active sources. On the other hand, the accuracy is high for low threshold value when the acoustic signal stream is highly polyphonic.…”
Section: Introductionmentioning
confidence: 99%
“…However, the level of polyphony for the test audio stream is unknown and varies with each frame. In [21][22] [23], the thresholds were manually set with values of 0.5, 0.95 and 0.5 respectively, which cannot capture the polyphonic level of the test acoustic stream at each frame.…”
Section: Introductionmentioning
confidence: 99%