2021
DOI: 10.48550/arxiv.2102.06930
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Deep Convolutional and Recurrent Networks for Polyphonic Instrument Classification from Monophonic Raw Audio Waveforms

Abstract: Sound Event Detection and Audio Classification tasks are traditionally addressed through time-frequency representations of audio signals such as spectrograms. However, the emergence of deep neural networks as efficient feature extractors has enabled the direct use of audio signals for classification purposes. In this paper, we attempt to recognize musical instruments in polyphonic audio by only feeding their raw waveforms into deep learning models. Various recurrent and convolutional architectures incorporatin… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2

Citation Types

0
4
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(4 citation statements)
references
References 5 publications
0
4
0
Order By: Relevance
“…The first consists of extracting a feature vector (FV) containing audio descriptors and using the baseline machine learning algorithms [12,[15][16][17][18][19][20][21][22][23][24][25][26][27]. The second is based on the 2D audio representation and a deep learning model [28][29][30][31][32][33][34][35][36][37][38][39][40][41], or a more automated version when a variational or deep softmax autoencoder is used for the audio representation retrieval [32,42]. Therefore, by employing machine learning, it is possible to implement a classifier for particular genres or instrument recognition.…”
Section: Metadatamentioning
confidence: 99%
See 3 more Smart Citations
“…The first consists of extracting a feature vector (FV) containing audio descriptors and using the baseline machine learning algorithms [12,[15][16][17][18][19][20][21][22][23][24][25][26][27]. The second is based on the 2D audio representation and a deep learning model [28][29][30][31][32][33][34][35][36][37][38][39][40][41], or a more automated version when a variational or deep softmax autoencoder is used for the audio representation retrieval [32,42]. Therefore, by employing machine learning, it is possible to implement a classifier for particular genres or instrument recognition.…”
Section: Metadatamentioning
confidence: 99%
“…Reviewing the literature that describes the classification of musical instruments, it can be seen that this has been in development for almost three decades [17,18,25,28,36,41]. These works use various sets of signals and statistical parameters for the analyzed samples, standard MPEG-7 descriptors, spectrograms, mel-frequency cepstral coefficients (MFCC), or constant-Q transform (CQT)-the basis for their operation.…”
Section: Metadatamentioning
confidence: 99%
See 2 more Smart Citations