2018 International Joint Conference on Neural Networks (IJCNN) 2018
DOI: 10.1109/ijcnn.2018.8489641
|View full text |Cite
|
Sign up to set email alerts
|

Learning Environmental Sounds with Multi-scale Convolutional Neural Network

Abstract: Deep learning has dramatically improved the performance of sounds recognition. However, learning acoustic models directly from the raw waveform is still challenging. Current waveform-based models generally use time-domain convolutional layers to extract features. The features extracted by single size filters are insufficient for building discriminative representation of audios. In this paper, we propose multi-scale convolution operation, which can get better audio representation by improving the frequency reso… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
35
0
1

Year Published

2018
2018
2023
2023

Publication Types

Select...
3
2
1

Relationship

1
5

Authors

Journals

citations
Cited by 59 publications
(36 citation statements)
references
References 25 publications
0
35
0
1
Order By: Relevance
“…We see that ACRNN outperforms PiczakCNN and obtains an absolute improvement of 13.2% and 21.2% on ESC-10 and ESC-50 datasets, respectively. Then, we compare our model with several state-of-the-art methods: SoundNet8 [1], WaveMsNet [28], EnvNet-v2 [21] and Multi-Stream CNN [12]. We observe that on both ESC-10 and ESC-50 datasets, ACRNN obtains the highest classification accuracy.…”
Section: 1%mentioning
confidence: 98%
See 3 more Smart Citations
“…We see that ACRNN outperforms PiczakCNN and obtains an absolute improvement of 13.2% and 21.2% on ESC-10 and ESC-50 datasets, respectively. Then, we compare our model with several state-of-the-art methods: SoundNet8 [1], WaveMsNet [28], EnvNet-v2 [21] and Multi-Stream CNN [12]. We observe that on both ESC-10 and ESC-50 datasets, ACRNN obtains the highest classification accuracy.…”
Section: 1%mentioning
confidence: 98%
“…ESC-10 ESC-50 PiczakCNN [15] 80.5% 64.9% SoundNet [1] 92.1% 74.2% WaveMsNet [28] 93.7% 79.1% EnvNet-v2 [21] 91.4% 84.9% Multi-Stream CNN [12] 93.7% 83.5% ACRNN 93.7%…”
Section: Modelmentioning
confidence: 99%
See 2 more Smart Citations
“…More recent research has experimented with features at different temporal scales by merging the RNN outputs overtime in a stacked RNN architecture [12] and with modifying spatial resolution by applying different filters to the input [13]. Other approaches towards improving ESC performance include higher-level input features such as MFCCs, gammatone features, or specialized filters [4,13] and training with data augmentation [3,14]. Multiple loss functions were used for the detection of rare sound events in [12].…”
Section: Related Workmentioning
confidence: 99%