Deep Convolutional and Recurrent Networks for Polyphonic Instrument Classification from Monophonic Raw Audio Waveforms

Avramidis, Kleanthis; Kratimenos, Agelos; Garoufis, Christos; Zlatintsi, Athanasia; Maragos, Petros

doi:10.1109/icassp39728.2021.9413479

Cited by 8 publications

(1 citation statement)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…With the continuous development of deep learning technology, the use of neural network technology for image and signal processing has become the choice of more and more researchers [ 5 , 6 ]. Especially in speech- and audio-related tasks [ 7 , 8 ], neural network techniques have performed better than traditional machine learning algorithms. Neural networks extract critical features from audio signals to classify ambient sounds efficiently and accurately [ 9 , 10 , 11 ].…”

Section: Introductionmentioning

confidence: 99%

An Automatic Classification System for Environmental Sound in Smart Cities

Zhang,

Zhong,

Xia

et al. 2023

Sensors

View full text Add to dashboard Cite

With the continuous promotion of “smart cities” worldwide, the approach to be used in combining smart cities with modern advanced technologies (Internet of Things, cloud computing, artificial intelligence) has become a hot topic. However, due to the non-stationary nature of environmental sound and the interference of urban noise, it is challenging to fully extract features from the model with a single input and achieve ideal classification results, even with deep learning methods. To improve the recognition accuracy of ESC (environmental sound classification), we propose a dual-branch residual network (dual-resnet) based on feature fusion. Furthermore, in terms of data pre-processing, a loop-padding method is proposed to patch shorter data, enabling it to obtain more useful information. At the same time, in order to prevent the occurrence of overfitting, we use the time-frequency data enhancement method to expand the dataset. After uniform pre-processing of all the original audio, the dual-branch residual network automatically extracts the frequency domain features of the log-Mel spectrogram and log-spectrogram. Then, the two different audio features are fused to make the representation of the audio features more comprehensive. The experimental results show that compared with other models, the classification accuracy of the UrbanSound8k dataset has been improved to different degrees.

show abstract