A Comparison of Human, Automatic and Collaborative Music Genre Classification and User Centric Evaluation of Genre Classification Systems

Seyerlehner, Klaus; Widmer, Gerhard; Knees, Peter

doi:10.1007/978-3-642-27169-4_9

Cited by 11 publications

(16 citation statements)

References 6 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The work proposing AdaBFFs (Bergstra et al 2006a), SRCAM (Panagakis et al 2009b), and the features of MAPsCAT (Andén and Mallat 2011), present only classification accuracy. Furthermore, based on classification accuracy, Seyerlehner et al (2010) argue that the performance gap between MGR systems and humans is narrowing; and in this issue, Humphrey et al conclude "progress in content-based music informatics is plateauing" (Humphrey et al 2013). Figure 2 shows that with respect to the classification accuracies in GTZAN reported in 83 published works (Sturm 2013b), those of AdaBFFs, SRCAM, and MAPsCAT lie above what is reported best in half of this work.…”

Section: Evaluating the Performance Statistics Of Mgr Systemsmentioning

confidence: 64%

“…The work by Vatolkin (2012) provides a comparison of various performance statistics for music classification. Other works (Berenzweig et al 2004;Craft et al 2007;Craft 2007;Lippens et al 2004;Wiggins 2009;Seyerlehner et al 2010;Sturm 2012b) argue for measuring performance in ways that take into account the natural ambiguity of music genre and similarity. For instance, we Sturm (2012b), Craft et al (2007) and Craft (2007) argue for richer experimental designs than having a system apply a single label to music with a possibly problematic "ground truth."…”

Section: Evaluation In Music Genre Recognition Researchmentioning

confidence: 99%

“…Typically, formally justifying a misclassification as an error is a task research in MGR often defers to the "ground truth" of a dataset, whether created by a listener (Tzanetakis and Cook 2002), the artist (Seyerlehner et al 2010), music vendors (Gjerdingen and Perrott 2008;Ariyaratne and Zhang 2012), the collective agreement of several listeners (Lippens et al 2004;García et al 2007) professional musicologists (Abeßer et al 2012), or multiple tags given by an online community (Law 2011). Table 2 shows the datasets used by references in our survey (Sturm 2012a).…”

Section: Featuresmentioning

confidence: 99%

“…Genres which are assumed to be very different, like Metal and Classic, were never confused." The human-like confusion tables found in MGR work, as well as the ambiguity between music genres, motivates evaluating MGR systems by considering as less troublesome the confusions we expect from humans Craft 2007;Lippens et al 2004;Seyerlehner et al 2010).…”

Section: Evaluating Performance In Particular Classesmentioning

confidence: 99%

“…We show these FoMs-all of which have been used in the past to rank MGR systems, e.g., Chai and Vercoe (2001), Tzanetakis and Cook (2002), Aucouturier and Pachet (2003), Burred and Lerch (2004), Turnbull and Elkan (2005), Flexer (2006), DeCoro et al (2007), Benetos and Kotropoulos (2008), Panagakis et al (2009b), Bergstra et al (2010), Fu et al (2011) and Ren and Jang (2012) citing one work from each year since 2001-do not reliably reflect the capacity of an MGR system to recognize genre. While these claims have not been made overt in any of the 467 references we survey (Sturm 2012a), shades of it have appeared before Craft 2007;Lippens et al 2004;Wiggins 2009;Seyerlehner et al 2010;Sturm 2012b), which argue for evaluating performance in ways that account for the ambiguity of genre being in large part a subjective construction (Fabbri 1982;Frow 2005). We go further and argue that the evaluation of MGR systems-the experimental designs, the datasets, and the FoMs-and indeed, the development of future systems, must embrace the fact that the recognition of genre is to a large extent a musical problem, and must be evaluated as such.…”

mentioning

confidence: 92%

See 4 more Smart Citations

Classification accuracy is not enough

Sturm

2013

J Intell Inf Syst

View full text Add to dashboard Cite

We argue that an evaluation of system behavior at the level of the music is required to usefully address the fundamental problems of music genre recognition (MGR), and indeed other tasks of music information retrieval, such as autotagging. A recent review of works in MGR since 1995 shows that most (82 %) measure the capacity of a system to recognize genre by its classification accuracy. After reviewing evaluation in MGR, we show that neither classification accuracy, nor recall and precision, nor confusion tables, necessarily reflect the capacity of a system to recognize genre in musical signals. Hence, such figures of merit cannot be used to reliably rank, promote or discount the genre recognition performance of MGR systems if genre recognition (rather than identification by irrelevant confounding factors) is the objective. This motivates the development of a richer experimental toolbox for evaluating any system designed to intelligently extract information from music signals.

show abstract

Section: Evaluating the Performance Statistics Of Mgr Systemsmentioning

confidence: 64%

Section: Evaluation In Music Genre Recognition Researchmentioning

confidence: 99%

Section: Featuresmentioning

confidence: 99%

Section: Evaluating Performance In Particular Classesmentioning

confidence: 99%

mentioning

confidence: 92%

See 3 more Smart Citations

Classification accuracy is not enough

Sturm

2013

J Intell Inf Syst

View full text Add to dashboard Cite

show abstract

Music Genre Classification Revisited: An In-Depth Examination Guided by Music Experts

Pálmason

Jónsson

Schedl

et al. 2018

Music Technology With Swing

View full text Add to dashboard Cite

A Survey of Evaluation in Music Genre Recognition

Sturm

2014

Adaptive Multimedia Retrieval: Semantics, Context, and Adaptation

View full text Add to dashboard Cite

Abstract. Much work is focused upon music genre recognition (MGR) from audio recordings, symbolic data, and other modalities. While reviews have been written of some of this work before, no survey has been made of the approaches to evaluating approaches to MGR. This paper compiles a bibliography of work in MGR, and analyzes three aspects of evaluation: experimental designs, datasets, and figures of merit.

show abstract

A Comparison of Human, Automatic and Collaborative Music Genre Classification and User Centric Evaluation of Genre Classification Systems

Cited by 11 publications

References 6 publications

Classification accuracy is not enough

Classification accuracy is not enough

Music Genre Classification Revisited: An In-Depth Examination Guided by Music Experts

A Survey of Evaluation in Music Genre Recognition

Contact Info

Product

Resources

About