2020
DOI: 10.1109/access.2020.3015047
|View full text |Cite
|
Sign up to set email alerts
|

MTF-CRNN: Multiscale Time-Frequency Convolutional Recurrent Neural Network for Sound Event Detection

Abstract: To reduce neural network parameter counts and improve sound event detection performance, we propose a multiscale time-frequency convolutional recurrent neural network (MTF-CRNN) for sound event detection. Our goal is to improve sound event detection performance and recognize target sound events with variable duration and different audio backgrounds with low parameter counts. We exploit four groups of parallel and serial convolutional kernels to learn high-level shift-invariant features from the time and freque… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
5
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
6
2
1

Relationship

0
9

Authors

Journals

citations
Cited by 13 publications
(5 citation statements)
references
References 38 publications
0
5
0
Order By: Relevance
“…Zhang et al 14 propose an AED module called Multi-Scale Time-Frequency Attention (MTFA) it informs the model where to focus along the time and frequency axes by collecting data at different resolutions, which has not been taken care in the past. Zhang et al 15 and Shen et al 16 proposed a a multiscale time-frequency convolutional recurrent neural network (MTF-CRNN) for sound event detection 15 .…”
Section: Related Workmentioning
confidence: 99%
“…Zhang et al 14 propose an AED module called Multi-Scale Time-Frequency Attention (MTFA) it informs the model where to focus along the time and frequency axes by collecting data at different resolutions, which has not been taken care in the past. Zhang et al 15 and Shen et al 16 proposed a a multiscale time-frequency convolutional recurrent neural network (MTF-CRNN) for sound event detection 15 .…”
Section: Related Workmentioning
confidence: 99%
“…In [39], the Multi-level Convolutional Pyramid Semantic Fusion (MCPSF) framework was proposed to integrate multi-level semantic features extracted by bag-of-visual-words (BoVW) model and convolutional neural network (CNN) model. Zhang et al [40] proposed a Multi-scale Time-Frequency Convolutional Recurrent Neural Network (MTF-CRNN) for sound time frequence map detection to improve sound event detection performance. Ding et al [41] proposed an Adaptive Multi-scale Detection (AdaMD) method, based on the hourglass neural network and the Gated Recurrent Unit (GRU) module, to extract different scale characteristics of time-frequency map.…”
Section: B Multi-level Structurementioning
confidence: 99%
“…Gaussian mixture models [6] and hidden Markov models [7] were initially used. Following the advancement of deep learning algorithms, SED methods utilizing convolutional neural networks (CNNs) and recurrent neural networks (RNNs) were introduced [8,9,10,11]. SED performs the functions of the human auditory system in several industries, including audio surveillance [12] and social welfare [13].…”
Section: Introductionmentioning
confidence: 99%