2012
DOI: 10.1007/978-3-642-35341-3_19
|View full text |Cite
|
Sign up to set email alerts
|

Enhancing Music Information Retrieval by Incorporating Image-Based Local Features

Abstract: Abstract. This paper presents a novel approach to music genre classification. Having represented music tracks in the form of two dimensional images, we apply the "bag of visual words" method from visual IR in order to classify the songs into 19 genres. By switching to visual domain, we can abstract from musical concepts such as melody, timbre and rhythm. We obtained classification accuracy of 46% (with 5% theoretical baseline for random classification) which is comparable with existing state-of-the-art approac… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2015
2015
2015
2015

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 11 publications
0
1
0
Order By: Relevance
“…In [31], Matsui et al first extracted SIFT keypoints [27] from the spectrogram and then clustered these keypoints based on their descriptors to form a musical feature for genre classification. In [32], Kaliciak et al first generated a set of local spectrogram patches by combining a corner detector [33] with a random points generator and then characterized these local patches in the form of a co-occurence matrix or color moments as was done in [34]. These local patch descriptors are finally employed for music genre classification by using the 'bag-of-visual-words' approach.…”
Section: Robust Spectrogram Image Feature Extractionmentioning
confidence: 99%
“…In [31], Matsui et al first extracted SIFT keypoints [27] from the spectrogram and then clustered these keypoints based on their descriptors to form a musical feature for genre classification. In [32], Kaliciak et al first generated a set of local spectrogram patches by combining a corner detector [33] with a random points generator and then characterized these local patches in the form of a co-occurence matrix or color moments as was done in [34]. These local patch descriptors are finally employed for music genre classification by using the 'bag-of-visual-words' approach.…”
Section: Robust Spectrogram Image Feature Extractionmentioning
confidence: 99%