Polyphonic sound event detection using multi label deep neural networks

Çakır, Emre; Heittola, Toni; Huttunen, Heikki; Virtanen, Tuomas

doi:10.1109/ijcnn.2015.7280624

Cited by 187 publications

(147 citation statements)

References 14 publications

Supporting

Mentioning

147

Contrasting

Order By: Relevance

“…These features has been shown to perform well in various audio tagging and sound event detection tasks [12,13,9]. First, we obtained the magnitude spectrum of the audio signals by using short-time Fourier transform (STFT) over 40 ms audio frames of 50% overlap, windowed with Hamming window.…”

Section: Featuresmentioning

confidence: 99%

Convolutional recurrent neural networks for bird audio detection

Çakır

Adavanne

Parascandolo

et al. 2017

2017 25th European Signal Processing Conference (EUSIPCO)

Self Cite

View full text Add to dashboard Cite

Bird sounds possess distinctive spectral structure which may exhibit small shifts in spectrum depending on the bird species and environmental conditions. In this paper, we propose using convolutional recurrent neural networks on the task of automated bird audio detection in real-life environments. In the proposed method, convolutional layers extract high dimensional, local frequency shift invariant features, while recurrent layers capture longer term dependencies between the features extracted from short time frames. This method achieves 88.5% Area Under ROC Curve (AUC) score on the unseen evaluation data and obtains the second place in the Bird Audio Detection challenge.

show abstract

Section: Featuresmentioning

confidence: 99%

Convolutional recurrent neural networks for bird audio detection

Çakır

Adavanne

Parascandolo

et al. 2017

2017 25th European Signal Processing Conference (EUSIPCO)

Self Cite

View full text Add to dashboard Cite

show abstract

“…Regarding the more challenging overlapped scenario, different approaches include temporally-constrained probabilistic analysis models [5], generalized Hough-transform based systems [6], HMM-based systems with multiple-path Viterbi decoding [7], non-negative matrix factorization [8], and multi-label deep neural networks. In particular, the latter have shown good performance by modeling overlapping events in a natural way [9,10].…”

Section: Introductionmentioning

confidence: 99%

On the Joint Use of NMF and Classification for Overlapping Acoustic Event Detection

Giannoulis

Potamianos

Maragos

2018

International Workshop on Computational Intelligence for Multimedia Understanding (IWCIM)

View full text Add to dashboard Cite

Abstract:In this paper, we investigate the performance of classifier-based non-negative matrix factorization (NMF) methods for detecting overlapping acoustic events. We provide evidence that the performance of classifier-based NMF systems deteriorates significantly in overlapped scenarios in case mixed observations are unavailable during training. To this end, we propose a K-means based method for artificial generation of mixed data. The method of Mixture of Local Dictionaries (MLD) is employed for the building of the NMF dictionary using both the isolated and artificially mixed data. Finally an SVM classifier is trained for each of the isolated and mixed event classes, using the corresponding MLD-NMF activations from the training set. The proposed system, tested on two experiments with (a) synthetic and (b) real events, outperforms the state-of-the-art classifier-based NMF system in the overlapped scenarios.

show abstract

“…In this paper, the acoustic events are defined as active acoustic events when they show their presence within the frames under consideration. According to [21], the accuracy is high for high threshold values in the low polyphony levels, where the polyphony level reflects the number of active sources. On the other hand, the accuracy is high for low threshold value when the acoustic signal stream is highly polyphonic.…”

Section: Introductionmentioning

confidence: 99%

“…However, the level of polyphony for the test audio stream is unknown and varies with each frame. In [21][22] [23], the thresholds were manually set with values of 0.5, 0.95 and 0.5 respectively, which cannot capture the polyphonic level of the test acoustic stream at each frame.…”

Section: Introductionmentioning

confidence: 99%

Frame-Wise Dynamic Threshold Based Polyphonic Acoustic Event Detection

et al. 2017

View full text Add to dashboard Cite

Acoustic event detection, the determination of the acoustic event type and the localisation of the event, has been widely applied in many real-world applications. Many works adopt multi-label classification techniques to perform the polyphonic acoustic event detection with a global threshold to detect the active acoustic events. However, the global threshold has to be set manually and is highly dependent on the database being tested. To deal with this, we replaced the fixed threshold method with a frame-wise dynamic threshold approach in this paper. Two novel approaches, namely contour and regressor based dynamic threshold approaches are proposed in this work. Experimental results on the popular TUT Acoustic Scenes 2016 database of polyphonic events demonstrated the superior performance of the proposed approaches.

show abstract

Polyphonic sound event detection using multi label deep neural networks

Cited by 187 publications

References 14 publications

Convolutional recurrent neural networks for bird audio detection

Convolutional recurrent neural networks for bird audio detection

On the Joint Use of NMF and Classification for Overlapping Acoustic Event Detection

Frame-Wise Dynamic Threshold Based Polyphonic Acoustic Event Detection

Contact Info

Product

Resources

About