* Corresponding author 1 Research applying machine learning to music modeling and generation typically proposes model architectures, training methods and datasets, and gauges system performance using quantitative measures like sequence likelihoods and/or qualitative listening tests. Rarely does such work explicitly question and analyse its usefulness for and impact on real-world practitioners, and then build on those outcomes to inform the development and application of machine learning. This article attempts to do these things for machine learning applied to music creation. Together with practitioners, we develop and use several applications of machine learning for music creation, and present a public concert of the results. We reflect on the entire experience to arrive at several ways of advancing these and similar applications of machine learning to music creation.
The application of artificial intelligence (AI) to music stretches back many decades, and presents numerous unique opportunities for a variety of uses, such as the recommendation of recorded music from massive commercial archives, or the (semi-)automated creation of music. Due to unparalleled access to music data and effective learning algorithms running on high-powered computational hardware, AI is now producing surprising outcomes in a domain fully entrenched in human creativity—not to mention a revenue source around the globe. These developments call for a close inspection of what is occurring, and consideration of how it is changing and can change our relationship with music for better and for worse. This article looks at AI applied to music from two perspectives: copyright law and engineering praxis. It grounds its discussion in the development and use of a specific application of AI in music creation, which raises further and unanticipated questions. Most of the questions collected in this article are open as their answers are not yet clear at this time, but they are nonetheless important to consider as AI technologies develop and are applied more widely to music, not to mention other domains centred on human creativity.
Abstract. We extend our evaluation of generative models of music transcriptions that were first presented in Sturm, Santos, Ben-Tal, and Korshunova (2016). We evaluate the models in five different ways: 1) at the population level, comparing statistics of 30,000 generated transcriptions with those of over 23,000 training transcriptions; 2) at the practice level, examining the ways in which specific generated transcriptions are successful as music compositions; 3) as a "nefarious tester", seeking the music knowledge limits of the models; 4) in the context of assisted music composition, using the models to create music within the conventions of the training data; and finally, 5) taking the models to real-world music practitioners. Our work attempts to demonstrate new approaches to evaluating the application of machine learning methods to modelling and making music, and the importance of taking the results back to the realm of music practice to judge their usefulness. Our datasets and software are open and available at https://github.com/IraKorshunova/folk-rnn.
In order to identify a perceptually valid measure of rhythm complexity, we used five measures from information theory and algorithmic complexity to measure the complexity of 48 artificially generated rhythmic sequences. We compared these measurements to human implicit and explicit complexity judgments obtained from a listening experiment, in which 32 participants guessed the last beat of each sequence. We also investigated the modulating effects of musical expertise and general pattern identification ability. Entropy rate was correlated with implicit and explicit judgments, Kolmogorov complexity was highly correlated with explicit judgments, and scores on the implicit task were correlated with selfassessed musical perceptual abilities. A logistic regression showed main effects of entropy rate and musical training, and an interaction between entropy rate and musical training. These results indicate that information-theoretic concepts capture some salient features of human rhythm perception, and confirm the influence of musical expertise in the perception of rhythm complexity.
Réflexions sur la synesthésie, la perception et la cognition. Nous nous penchons dans cet article sur trois questions toujours actuelles : sur le rapport entre le monde physique et le monde perçu, sur la difficulté d’expliquer les différences individuelles dans la perception du monde environnant, et sur l’énigme de la compréhension de l’esprit d’autrui. En examinant la relation entre la synesthésie et les hallucinations, et entre les hallucinations et la perception normale, nous montrons que tous ces phénomènes ont bien plus en commun qu’on ne le supputait. Nous nous interrogeons ensuite sur la plausibilité d’une analyse fonctionnelle de la synesthésie, et examinons les mécanismes de différents types de perception ordinaire et extraordinaire. Nous soutenons que des mécanismes de même type que ceux impliqués dans les synesthésies pourraient intervenir tout un éventail de phénomènes perceptifs et cognitifs, et montrons l’utilité d’une telle approche eu égard à l’ubiquité dans la cognition humaine des processus qui formellement rappellent la synesthésie.
Kudac (Kingston University Digital Arts Collective) is an electronic improvisation ensemble that brings staff and students together for weekly musicking with technology -incorporating resources ranging from conventional instruments, to computers, to hacked circuit boards. A central element of the ensemble from its inception has been its democratic approach -staff and students explore the musical possibilities and challenges together and gradually mould their practice through a free exchange. In this article we consider the contribution of this ensemble in several overlapping domains: in relation to the individual students, in the context of a higher education music department, and at the intersection of research and teaching. We first survey the structure and activities of the ensemble, contextualizing this with reference to existing research in the fields of laptop performance, free improvisation and musical identity formation. We use this as a platform for tracing how such an ensemble may aid the social construction and shaping of creative identities at both an individual and collective level. We then examine the opportunities and challenges for a music department hosting such an ensemble before highlighting areas for future study.
Despite the widespread use of the term gesture in writings about music, the term is not defined in most musical dictionaries. Moreover, as this paper shows, the term is employed by different writers in a wide variety of ways. One common use of the term refers to sonic instances that are close analogies of physical gestures. These could be termed expressive unit gestures which, like physical gestures, are perceived as a short, unified, expressive events. To enable a more detailed study of these, this paper outlines a systematic approach to their description in the hope that these will open possibilities for a more detailed study of this aspect of musical gestures. The HamNoSys notation was developed for the systematic description of sign language. Gestures of sign language are notated through a systematic profiling of the actions involved (hand shapes, movement types, etc.). By analogy a musical gesture can be described through its auditory properties such as accent patterns, pitch contour, register, and so forth. This paper suggests that studying these expressive unit gestures offers a way of linking the dynamics of the music with expressive potential and can, therefore, contribute to an experiential account of music; could lead to new methods of investigating listeners engagement with music; and could potentially offer new ideas in the field of music information retrieval.
A goal of sonification research is the intuitive audio representation of complex, multidimensional data.The authors present two facets of this research that may provide insight into the creative process. First, they discuss aspects of categorical perception in non-verbal auditory scene analysis and propose that these characteristics are simplified models of creative engagement with sound. Second, they describe the use of sonified data in musical compositions by each of the authors and observe aspects of the creative process in the purely aesthetic use of sonified statistical data. Creative ListeningRecent research confirms music theorists' speculation that listening to music involves an impressive amount of mental processing [1]. These data-reductive processes facilitate interpretation by segmenting the signal and organizing the segments into numerous categories. Within a relatively brief period of time, listeners---trained and untrained alike---make judgments regarding a wide range of factors such as genre, idiom, metric inference, tonal orientation and emotional type. From these "snap" judgments, contexts are formulated and an intricate network of expectations generated [2]. While we can speculate about many of the methods listeners employ in their creative engagement with musical sound, the complexity of musical listening, particularly with the emotional associations carried with it, seems daunting.It is, perhaps, somewhat less daunting to consider similar processes of categorical perception of acoustic signals when the listening task is purposeful, as is the case with speech perception or a non-verbal, highly specific and directed listening task such as the auditory detection of timing irregularities when tuning an automobile engine. In performing the latter task, "knocks," "rattles," "pings" and other categorized sounds allow for the detection of problems in an unseen motor. Detection is possible because the limited categories are easy to learn and the task becomes reasonably intuitive. Auditory analysis and interpretation of this type is, in essence, a greatly simplified instance of musical listening. The fundamental tasks---segregating intrinsic from extrinsic sound, categorizing the intrinsically relevant sounds, and integrating temporal information to arrive at an interpretation---exist in both types of listening. Our research explores the development of similarly intuitive methods and tools for categorizing and interpreting sonified data. Sonification of multidimensional data finds potential uses in scientific, industrial and medical fields.Consider, for example, one of the earliest instruments built to facilitate sonification of medical data. From the outset Rene Laennec's invention of the stethoscope in 1819 was meant to interpret multidimensional auditory data. Mediated auscultation with the aid of the stethoscope was, in the inventor's terms, able to interpret the integrated sounds, @extract = not only of the action of the heart, but of every species of sound produced by the motion of all t...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.