2016
DOI: 10.1121/1.4946988
|View full text |Cite
|
Sign up to set email alerts
|

Sparse feature learning for instrument identification: Effects of sampling and pooling methods

Abstract: Feature learning for music applications has recently received considerable attention from many researchers. This paper reports on the sparse feature learning algorithm for musical instrument identification, and in particular, focuses on the effects of the frame sampling techniques for dictionary learning and the pooling methods for feature aggregation. To this end, two frame sampling techniques are examined that are fixed and proportional random sampling. Furthermore, the effect of using onset frame was analyz… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
11
0

Year Published

2016
2016
2022
2022

Publication Types

Select...
8
1
1

Relationship

1
9

Authors

Journals

citations
Cited by 17 publications
(11 citation statements)
references
References 2 publications
0
11
0
Order By: Relevance
“…Pooling significantly reduces the computational complexity for the processing steps. Max-pooling and average-pooling are two of the most common pooling methods across various tasks [ 40 ]. In this research, max-pooling is chosen for the resolution reduction.…”
Section: Learning Methodsmentioning
confidence: 99%
“…Pooling significantly reduces the computational complexity for the processing steps. Max-pooling and average-pooling are two of the most common pooling methods across various tasks [ 40 ]. In this research, max-pooling is chosen for the resolution reduction.…”
Section: Learning Methodsmentioning
confidence: 99%
“…Pooling operations significantly reduce the computational complexity. Max-pooling and average-pooling are the two most common pooling methods across various tasks [35]. In this research, max-pooling can be expressed as…”
Section: Polling Layermentioning
confidence: 99%
“…A mel-scale is based on the human auditory system and is approximately logarithmic above 1 kHz [23]. We used 128 mel-frequency bins following representation learning researches on music annotation [24], [25], musical instrument identification task [26], and fingering detection of overblown flute sound [27]; this is a reasonable size that sufficiently retain the original spectral characteristics, while significantly reducing the dimensionality of the data.…”
Section: A Audio Preprocessingmentioning
confidence: 99%