Supervised model training for overlapping sound events based on unsupervised source separation

Heittola, Toni; Mesaros, Annamaria; Virtanen, Tuomas; Gabbouj, Moncef

doi:10.1109/icassp.2013.6639360

Cited by 44 publications

(38 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…CRNN has significantly higher performance than previous methods [14], [24], [25], [38], and it still shows considerable improvement over other neural network approaches.…”

Section: B Tut-sed 2009mentioning

confidence: 97%

“…First published systems were scene-dependent, where information about the scene is provided to the system and separate event models are trained for each scene [14], [24], [25]. More recent work [11], [15], as well as the current study, consist of scene-independent systems.…”

Section: B Tut-sed 2009mentioning

confidence: 99%

“…More recent work [11], [15], as well as the current study, consist of scene-independent systems. Methods [24], [25] are HMM based, using either multiple Viterbi decoding stages or NMF pre-processing to do polyphonic SED. In contrast, the use of NMF in [14] does not build explicit class models, but performs coupled NMF of spectral representation and event activity annotations to build dictionaries.…”

Section: B Tut-sed 2009mentioning

confidence: 99%

“…This simple architecture-while vastly improving over established approaches such as GMMHMMs [24] and NMF source separation based SED [25], [26]-presents two major shortcomings: (1) it lacks both time and frequency invariance-due to the fixed connections between the input and the hidden units-which would allow to model small variations in the events; (2) temporal context is restricted to short time windows, preventing effective modeling of typically longer events (e.g., rain) and events correlations.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Convolutional Recurrent Neural Networks for Polyphonic Sound Event Detection

Çakır

Parascandolo

Heittola

et al. 2017

IEEE/ACM Trans. Audio Speech Lang. Process.

Self Cite

465

337

View full text Add to dashboard Cite

Abstract-Sound events often occur in unstructured environments where they exhibit wide variations in their frequency content and temporal structure. Convolutional neural networks (CNN) are able to extract higher level features that are invariant to local spectral and temporal variations. Recurrent neural networks (RNNs) are powerful in learning the longer term temporal context in the audio signals. CNNs and RNNs as classifiers have recently shown improved performances over established methods in various sound recognition tasks. We combine these two approaches in a Convolutional Recurrent Neural Network (CRNN) and apply it on a polyphonic sound event detection task. We compare the performance of the proposed CRNN method with CNN, RNN, and other established methods, and observe a considerable improvement for four different datasets consisting of everyday sound events.

show abstract

“…CRNN has significantly higher performance than previous methods [14], [24], [25], [38], and it still shows considerable improvement over other neural network approaches.…”

Section: B Tut-sed 2009mentioning

confidence: 97%

Section: B Tut-sed 2009mentioning

confidence: 99%

Section: B Tut-sed 2009mentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Convolutional Recurrent Neural Networks for Polyphonic Sound Event Detection

Çakır

Parascandolo

Heittola

et al. 2017

IEEE/ACM Trans. Audio Speech Lang. Process.

Self Cite

465

337

View full text Add to dashboard Cite

show abstract

“…A more complex situation deals with detecting sound events in audio with multiple overlapping sounds, as is usually the case in our everyday environment. In this case, it is possible to perform detection of the most prominent sound event from the number of concurrent sounds at each time [15], or detection of multiple overlapping sound events [16][17][18]. We use the term polyphonic sound event detection for the latter, in contrast to monophonic sound event detection in which the system output is a sequence of non-overlapping sound events.…”

Section: Introductionmentioning

confidence: 99%

Metrics for Polyphonic Sound Event Detection

2016

Self Cite

View full text Add to dashboard Cite

This paper presents and discusses various metrics proposed for evaluation of polyphonic sound event detection systems used in realistic situations where there are typically multiple sound sources active simultaneously. The system output in this case contains overlapping events, marked as multiple sounds detected as being active at the same time. The polyphonic system output requires a suitable procedure for evaluation against a reference. Metrics from neighboring fields such as speech recognition and speaker diarization can be used, but they need to be partially redefined to deal with the overlapping events. We present a review of the most common metrics in the field and the way they are adapted and interpreted in the polyphonic case. We discuss segment-based and event-based definitions of each metric and explain the consequences of instance-based and class-based averaging using a case study. In parallel, we provide a toolbox containing implementations of presented metrics.

show abstract

Novel Time-Frequency Based Scheme for Detecting Sound Events from Sound Background in Audio Segments

Hajihashemi

Gharahbagh

Oliveira

et al. 2021

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Usually, Sound event detection systems that classify different events from sound data have two main blocks. In the first block, sound events are separated from sound background and in next block, different events are classified. In recent years, this research area has become increasingly popular in a wide range of applications, such as in surveillance and city patterns learning and recognition, mainly when combined with imaging sensors. However, it still poses challenging problems due to existent noise, complexity of the events, poor microphone(s) quality, bad microphone location(s), or events occurring simultaneously. This research aimed to compare accurate signal processing and classification methods to suggest a novel method for detecting sound events from sound background in urban scenes. Using wavelet and Mel-frequency cepstral coefficients, the analysis of the effect of classification methods and minimization of the number of train data are some of the advantages of the proposed method. The proposed methods' application to a standard sounds database led to an accuracy of about 99% in event detection.

show abstract

Supervised model training for overlapping sound events based on unsupervised source separation

Cited by 44 publications

References 13 publications

Convolutional Recurrent Neural Networks for Polyphonic Sound Event Detection

Convolutional Recurrent Neural Networks for Polyphonic Sound Event Detection

Metrics for Polyphonic Sound Event Detection

Novel Time-Frequency Based Scheme for Detecting Sound Events from Sound Background in Audio Segments

Contact Info

Product

Resources

About