“…We show these FoMs-all of which have been used in the past to rank MGR systems, e.g., Chai and Vercoe (2001), Tzanetakis and Cook (2002), Aucouturier and Pachet (2003), Burred and Lerch (2004), Turnbull and Elkan (2005), Flexer (2006), DeCoro et al (2007), Benetos and Kotropoulos (2008), Panagakis et al (2009b), Bergstra et al (2010), Fu et al (2011) and Ren and Jang (2012) citing one work from each year since 2001-do not reliably reflect the capacity of an MGR system to recognize genre. While these claims have not been made overt in any of the 467 references we survey (Sturm 2012a), shades of it have appeared before Craft 2007;Lippens et al 2004;Wiggins 2009;Seyerlehner et al 2010;Sturm 2012b), which argue for evaluating performance in ways that account for the ambiguity of genre being in large part a subjective construction (Fabbri 1982;Frow 2005). We go further and argue that the evaluation of MGR systems-the experimental designs, the datasets, and the FoMs-and indeed, the development of future systems, must embrace the fact that the recognition of genre is to a large extent a musical problem, and must be evaluated as such.…”