ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
DOI: 10.1109/icassp39728.2021.9413479
|View full text |Cite
|
Sign up to set email alerts
|

Deep Convolutional and Recurrent Networks for Polyphonic Instrument Classification from Monophonic Raw Audio Waveforms

Abstract: Sound Event Detection and Audio Classification tasks are traditionally addressed through time-frequency representations of audio signals such as spectrograms. However, the emergence of deep neural networks as efficient feature extractors has enabled the direct use of audio signals for classification purposes. In this paper, we attempt to recognize musical instruments in polyphonic audio by only feeding their raw waveforms into deep learning models. Various recurrent and convolutional architectures incorporatin… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 8 publications
(1 citation statement)
references
References 11 publications
0
1
0
Order By: Relevance
“…With the continuous development of deep learning technology, the use of neural network technology for image and signal processing has become the choice of more and more researchers [ 5 , 6 ]. Especially in speech- and audio-related tasks [ 7 , 8 ], neural network techniques have performed better than traditional machine learning algorithms. Neural networks extract critical features from audio signals to classify ambient sounds efficiently and accurately [ 9 , 10 , 11 ].…”
Section: Introductionmentioning
confidence: 99%
“…With the continuous development of deep learning technology, the use of neural network technology for image and signal processing has become the choice of more and more researchers [ 5 , 6 ]. Especially in speech- and audio-related tasks [ 7 , 8 ], neural network techniques have performed better than traditional machine learning algorithms. Neural networks extract critical features from audio signals to classify ambient sounds efficiently and accurately [ 9 , 10 , 11 ].…”
Section: Introductionmentioning
confidence: 99%