2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2017
DOI: 10.1109/icassp.2017.7952260
|View full text |Cite
|
Sign up to set email alerts
|

Sound event detection using spatial features and convolutional recurrent neural network

Abstract: This paper proposes to use low-level spatial features extracted from multichannel audio for sound event detection. We extend the convolutional recurrent neural network to handle more than one type of these multichannel features by learning from each of them separately in the initial stages. We show that instead of concatenating the features of each channel into a single feature vector the network learns sound events in multichannel audio better when they are presented as separate layers of a volume. Using the … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
101
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
5
3
2

Relationship

0
10

Authors

Journals

citations
Cited by 126 publications
(101 citation statements)
references
References 25 publications
(44 reference statements)
0
101
0
Order By: Relevance
“…We acquired the strong labels of most of the training dataset recordings via manual annotations, to be used only for evaluation purposes. 2 For our training set the first 499 recordings of the NIPS4B 2013 training dataset are used, while the rest are included in our testing set, excluding 14 recordings for which confident strong annotations could not be attained. Those 14 recordings were added to our training set totalling to 513 training recordings and 174 testing recordings.…”
Section: Discussionmentioning
confidence: 99%
“…We acquired the strong labels of most of the training dataset recordings via manual annotations, to be used only for evaluation purposes. 2 For our training set the first 499 recordings of the NIPS4B 2013 training dataset are used, while the rest are included in our testing set, excluding 14 recordings for which confident strong annotations could not be attained. Those 14 recordings were added to our training set totalling to 513 training recordings and 174 testing recordings.…”
Section: Discussionmentioning
confidence: 99%
“…Most recent advances in polyphonic SED are largely attributed to the use of Machine Learning and Deep Neural Networks [8,9,10,11,12,13]. In particular, the use of Convolutional Recurrent Neural Networks (CRNNs) has significantly improved SED performance in the past few years [14,15,16,17]. However, there are three main disadvantages with current CRNN-based polyphonic SED approaches.…”
Section: Related Workmentioning
confidence: 99%
“…In this paper, we want to exploit the capabilities of both QNNs and Ambisonics to analyze 3D sounds, and in particular we focus on the localization and detection of 3D sound events. Both tasks have been widely investigated recently by using convolutional neural networks (CNNs) [19][20][21][22][23][24][25]. They are also considered as a joint task in [26] for 3D sounds, but considering each microphone signal as a separate real-valued signal.…”
Section: Introductionmentioning
confidence: 99%