2022
DOI: 10.13052/jwe1540-9589.21216
|View full text |Cite
|
Sign up to set email alerts
|

Convolutional Neural Networks Using Log Mel-Spectrogram Separation for Audio Event Classification with Unknown Devices

Abstract: Audio event classification refers to the detection and classification of non-verbal signals, such as dog and horn sounds included in audio data, by a computer. Recently, deep neural network technology has been applied to audio event classification, exhibiting higher performance when compared to existing models. Among them, a convolutional neural network (CNN)-based training method that receives audio in the form of a spectrogram, which is a two-dimensional image, has been widely used. However, audio event clas… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 7 publications
(6 citation statements)
references
References 13 publications
(18 reference statements)
0
5
0
Order By: Relevance
“…In this experiment, vanilla MobileNet v2 was less accurate than VGG-Resnet, although the MobileNet v2 model performed better after data augmentation. VGG-Resnet had a comparable performance in the audio event classification domain [32]. MobileNet v2 with time and frequency masking was considerably more accurate than the VGG-Resnet model.…”
Section: Resultsmentioning
confidence: 94%
See 2 more Smart Citations
“…In this experiment, vanilla MobileNet v2 was less accurate than VGG-Resnet, although the MobileNet v2 model performed better after data augmentation. VGG-Resnet had a comparable performance in the audio event classification domain [32]. MobileNet v2 with time and frequency masking was considerably more accurate than the VGG-Resnet model.…”
Section: Resultsmentioning
confidence: 94%
“…There were five versions of the AEC model: The results were compared with those for VGG-Resnet, which is an ensemble version of a CNN-based VGG network (VGGnet) [32] and a residual network [33].…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…The problem caused by differences in frequency‐domain emphasis by various audio input devices was addressed in this study's audio preprocessing step using a log‐Mel spectrogram because it can provide more accurate and detailed characteristics in the high‐ and low‐frequency domains than the Mel‐spectrogram [48]. In addition, log‐Mel spectrograms can improve performance, as demonstrated by the DCASE 2020 challenge for audio scene classification [48]. At a sampling rate of 16000, each 3‐s sample was processed to create a single log‐Mel time‐frequency spectrogram.…”
Section: Methodsmentioning
confidence: 99%
“…CNNs excel in classifying audio signals across a spectrum of categories, encompassing speech, music, and environmental sounds. Their proficiency extends to tasks such as speech recognition, speaker identification, and even emotion recognition [18]- [22]. On the other hand, RNNs demonstrate prowess in audio classification and segmentation, effectively disassembling and categorizing audio data with remarkable accuracy [23]- [27].…”
Section: Introductionmentioning
confidence: 99%