Recurrent neural networks for polyphonic sound event detection in real life recordings

Parascandolo, Giambattista; Huttunen, Heikki; Virtanen, Tuomas

doi:10.1109/icassp.2016.7472917

Cited by 283 publications

(187 citation statements)

References 22 publications

Supporting

Mentioning

182

Contrasting

Unclassified

Order By: Relevance

“…The main metric used in previous works [11], [14], [15] on TUT-SED 2009 dataset differs from the F1 score calculation used in this paper. In previous works, F1 score was computed in each segment, then averaged along segments for each scene, and finally averaged across scene scores, instead of accumulating intermediate statistics.…”

Section: B Evaluation Metricsmentioning

confidence: 94%

“…With the emergence of more arXiv:1702.06286v1 [cs.LG] 21 Feb 2017 advanced deep learning techniques and publicly available reallife databases that are suitable for the task, polyphonic SED has attracted more interest in recent years. Non-negative matrix factorization (NMF) based source separation [14] and deep learning based methods (such as feedforward neural networks (FNN) [15], CNN [16] and RNN [11]) have been shown to perform significantly better compared to established methods such as GMM-HMM for polyphonic SED.…”

Section: Introductionmentioning

confidence: 99%

“…Recurrent neural networks (RNNs), which have been successfully applied to automatic speech recognition (ASR) [20] and polyphonic SED [11], solve the latter shortcoming by integrating information from the earlier time windows, presenting a theoretically unlimited context information. However, RNNs do not easily capture the invariance in the frequency domain, rendering a high-level modeling of the data more difficult.…”

Section: Introductionmentioning

confidence: 99%

“…Previous work on sound events has been mostly focused on sound event classification, where audio clips consisting of sound events are classified. Apart from established classifiers-such as support vector machines [1], [3]-deep learning methods such as deep belief networks [7], convolutional neural networks (CNN) [8], [9], [10] and recurrent neural networks (RNN) [4], [11] have been recently proposed. Initially, the interest on SED was more focused on monophonic SED.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Convolutional Recurrent Neural Networks for Polyphonic Sound Event Detection

Çakır

Parascandolo

Heittola

et al. 2017

IEEE/ACM Trans. Audio Speech Lang. Process.

Self Cite

437

327

View full text Add to dashboard Cite

Abstract-Sound events often occur in unstructured environments where they exhibit wide variations in their frequency content and temporal structure. Convolutional neural networks (CNN) are able to extract higher level features that are invariant to local spectral and temporal variations. Recurrent neural networks (RNNs) are powerful in learning the longer term temporal context in the audio signals. CNNs and RNNs as classifiers have recently shown improved performances over established methods in various sound recognition tasks. We combine these two approaches in a Convolutional Recurrent Neural Network (CRNN) and apply it on a polyphonic sound event detection task. We compare the performance of the proposed CRNN method with CNN, RNN, and other established methods, and observe a considerable improvement for four different datasets consisting of everyday sound events.

show abstract

Section: B Evaluation Metricsmentioning

confidence: 94%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Convolutional Recurrent Neural Networks for Polyphonic Sound Event Detection

Çakır

Parascandolo

Heittola

et al. 2017

IEEE/ACM Trans. Audio Speech Lang. Process.

Self Cite

437

327

View full text Add to dashboard Cite

show abstract

“…This work was continued in [11], which applied bidirectional long short term memory recurrent neural networks (BLSTM RNNs) for the same task. It is worth noting that the methods of [10] [11] were only applied on proprietary data.…”

Section: Introductionmentioning

confidence: 99%

Polyphonic Sound Event Tracking Using Linear Dynamical Systems

Benetos

Lafay

Lagrange

et al. 2017

IEEE/ACM Trans. Audio Speech Lang. Process.

View full text Add to dashboard Cite

Abstract-In this paper, a system for polyphonic sound event detection and tracking is proposed, based on spectrogram factorisation techniques and state space models. The system extends probabilistic latent component analysis (PLCA) and is modelled around a 4-dimensional spectral template dictionary of frequency, sound event class, exemplar index, and sound state. In order to jointly track multiple overlapping sound events over time, the integration of linear dynamical systems (LDS) within the PLCA inference is proposed. The system assumes that the PLCA sound event activation is the (noisy) observation in an LDS, with the latent states corresponding to the true event activations. LDS training is achieved using fully observed data, making use of ground truth-informed event activations produced by the PLCA-based model. Several LDS variants are evaluated, using polyphonic datasets of office sounds generated from an acoustic scene simulator, as well as real and synthesized monophonic datasets for comparative purposes. Results show that the integration of LDS tracking within PLCA leads to an improvement of +8.5-10.5% in terms of frame-based F-measure as compared to the use of the PLCA model alone. In addition, the proposed system outperforms several state-of-the-art methods for the task of polyphonic sound event detection.

show abstract

Environmental sound processing and its applications

Miyazaki

Toda

Hayashi

et al. 2019

IEEJ Transactions Elec Engng

View full text Add to dashboard Cite

As part of the effort to develop techniques for understanding environments using sound, many studies in the field of computational auditory scene analysis have focused on using computers to perform functions carried out naturally by the human auditory system. Thanks to recent progress in machine‐learning techniques, these environmental sound‐processing techniques have significantly improved and a widening variety of applications has resulted in considerable interest in this field. In this review, we introduce the fundamental techniques of environmental sound processing, as well as recent advances in front‐end and back‐end processing and potential applications for these techniques. Prospects for further progress in the field of environmental sound processing and the challenges still to be overcome are also discussed. © 2019 Institute of Electrical Engineers of Japan. Published by John Wiley & Sons, Inc.

show abstract

Recurrent neural networks for polyphonic sound event detection in real life recordings

Cited by 283 publications

References 22 publications

Convolutional Recurrent Neural Networks for Polyphonic Sound Event Detection

Convolutional Recurrent Neural Networks for Polyphonic Sound Event Detection

Polyphonic Sound Event Tracking Using Linear Dynamical Systems

Environmental sound processing and its applications

Contact Info

Product

Resources

About