Musical genre classification using melody features extracted from polyphonic music signals

Rocha, Bruno

doi:10.1109/icassp.2012.6287822

Cited by 55 publications

(25 citation statements)

References 7 publications

(9 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For Classify, most works measure MGR performance by classification accuracy (the ratio of "correct" predictions to all observations) computed from k-fold stratified cross-validation (kfCV), e.g., 2fCV (4 papers) [7,22,23,56], 3fCV (3 papers) [18,71,74], 5fCV (6 papers) [3,13,30,31,53,100], and 10fCV (55 papers) [2,5,9,11,14,16,17,24-26,28,29,34,35,37,39-42, 44,47-51,57,58,60-64,66-68,70,72,73,75,76,78,79,82-85,88-91,94-96,98,99]. Most of these use a single run of cross-validation; however, some perform multiple runs, e.g., 10 independent runs of 2fCV (10x2CV) [56] or 20x2fCV [22,23], 10x3fCV [71,74], and 10x10fCV [37,70,72,75,[83][84][85]. In one experiment, Li and Sleep [42] use 10fCV with random partitions; but in another, they partition the excerpts into folds based on their file number -roughly implementing an artist filter.…”

Section: Using Gtzanmentioning

confidence: 99%

The State of the Art Ten Years After a State of the Art: Future Research in Music Information Retrieval

Sturm

2014

Journal of New Music Research

View full text Add to dashboard Cite

The GTZAN dataset appears in at least 100 published works, and is the most-used public dataset for evaluation in machine listening research for music genre recognition (MGR). Our recent work, however, shows GTZAN has several faults (repetitions, mislabelings, and distortions), which challenge the interpretability of any result derived using it. In this article, we disprove the claims that all MGR systems are affected in the same ways by these faults, and that the performances of MGR systems in GTZAN are still meaningfully comparable since they all face the same faults. We identify and analyze the contents of GTZAN, and provide a catalog of its faults. We review how GTZAN has been used in MGR research, and find few indications that its faults have been known and considered. Finally, we rigorously study the effects of its faults on evaluating five different MGR systems. The lesson is not to banish GTZAN, but to use it with consideration of its contents.

show abstract

Section: Using Gtzanmentioning

confidence: 99%

The State of the Art Ten Years After a State of the Art: Future Research in Music Information Retrieval

Sturm

2014

Journal of New Music Research

View full text Add to dashboard Cite

show abstract

“…Salomon et al [30] combined MFCC with melodic highlevel features that describe the pitch contour of the main melody in polyphonic music recordings. The authors reported a classification accuracy of 82 % for five music genres.…”

Section: Related Workmentioning

confidence: 99%

Improved music similarity computation based on tone objects

Krasser

Abeßer

Grossmann

et al. 2012

Proceedings of the 7th Audio Mostly Conference: A Conference on Interaction With Sound

View full text Add to dashboard Cite

In this paper, we propose a novel approach for music similarity estimation. It combines temporal segmentation of music signals with source separation into so-called tone objects. We solely use the timbre-related audio features Mel- Frequency Cepstral Coefficients (MFCC) and Octave-based Spectral Contrast (OSC) to describe the extracted tone objects. First, we compare our approach to a baseline system that employs frame-wise feature extraction and bagof- frames classification. Second, we set up a system that extracts features on perfectly isolated single track recordings, achieving near perfect classification. Finally, we compare our novel approach against the basis experiments. We find that it clearly outperforms the baseline system in a fiveclass genre classification task. Our results indicate that tone object based feature extraction clearly improves music similarity estimation

show abstract

“…They used the log energies and Mel-frequency cepstrum coefficients as the musical feature, and used support vector machine as a classifier. J. Salamon [13] et. al.…”

Section: Related Workmentioning

confidence: 99%

Music Genre Classification of MPEG AAC Audio Data

Kobayakawa

Hoshi

Yuzawa

2014

2014 IEEE International Symposium on Multimedia

View full text Add to dashboard Cite

In this paper, we propose a musical feature extracted from the bitstream of AAC (Advanced Audio Coding) compressed audio data without decoding to audio signals. We focus on the spectral data which are stored in the bitstream for representing the flatten MDCT (Modified Discrete Cosine Transform) of an audio signal. For computing the musical feature, we extract the spectral data and apply the Discrete Wavelet Transform (DWT) to the extracted spectral data. For musical genre classification, we use the discriminant analysis as a classifier. We experimented on 1, 498 AAC compressed audio data collected from 10 musical genres and evaluated the performance of the musical feature. We got the maximum correct ratios 81.24%. The experiments showed that the musical feature based on the spectral data in the bitstream had good performance for genre classification in the MPEG-4 AAC compressed domain.

show abstract

Musical genre classification using melody features extracted from polyphonic music signals

Cited by 55 publications

References 7 publications

The State of the Art Ten Years After a State of the Art: Future Research in Music Information Retrieval

The State of the Art Ten Years After a State of the Art: Future Research in Music Information Retrieval

Improved music similarity computation based on tone objects

Music Genre Classification of MPEG AAC Audio Data

Contact Info

Product

Resources

About