2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2016
DOI: 10.1109/icassp.2016.7472917
|View full text |Cite
|
Sign up to set email alerts
|

Recurrent neural networks for polyphonic sound event detection in real life recordings

Abstract: In this paper we present an approach to polyphonic sound event detection in real life recordings based on bi-directional long short term memory (BLSTM) recurrent neural networks (RNNs). A single multilabel BLSTM RNN is trained to map acoustic features of a mixture signal consisting of sounds from multiple classes, to binary activity indicators of each event class. Our method is tested on a large database of real-life recordings, with 61 classes (e.g. music, car, speech) from 10 different everyday contexts. The… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

2
182
1
2

Year Published

2017
2017
2019
2019

Publication Types

Select...
4
3

Relationship

1
6

Authors

Journals

citations
Cited by 283 publications
(187 citation statements)
references
References 22 publications
2
182
1
2
Order By: Relevance
“…The main metric used in previous works [11], [14], [15] on TUT-SED 2009 dataset differs from the F1 score calculation used in this paper. In previous works, F1 score was computed in each segment, then averaged along segments for each scene, and finally averaged across scene scores, instead of accumulating intermediate statistics.…”
Section: B Evaluation Metricsmentioning
confidence: 94%
See 3 more Smart Citations
“…The main metric used in previous works [11], [14], [15] on TUT-SED 2009 dataset differs from the F1 score calculation used in this paper. In previous works, F1 score was computed in each segment, then averaged along segments for each scene, and finally averaged across scene scores, instead of accumulating intermediate statistics.…”
Section: B Evaluation Metricsmentioning
confidence: 94%
“…With the emergence of more arXiv:1702.06286v1 [cs.LG] 21 Feb 2017 advanced deep learning techniques and publicly available reallife databases that are suitable for the task, polyphonic SED has attracted more interest in recent years. Non-negative matrix factorization (NMF) based source separation [14] and deep learning based methods (such as feedforward neural networks (FNN) [15], CNN [16] and RNN [11]) have been shown to perform significantly better compared to established methods such as GMM-HMM for polyphonic SED.…”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations
“…This work was continued in [11], which applied bidirectional long short term memory recurrent neural networks (BLSTM RNNs) for the same task. It is worth noting that the methods of [10] [11] were only applied on proprietary data.…”
Section: Introductionmentioning
confidence: 99%