Emmanuel Deruty scite author profile

* Corresponding author 1 Research applying machine learning to music modeling and generation typically proposes model architectures, training methods and datasets, and gauges system performance using quantitative measures like sequence likelihoods and/or qualitative listening tests. Rarely does such work explicitly question and analyse its usefulness for and impact on real-world practitioners, and then build on those outcomes to inform the development and application of machine learning. This article attempts to do these things for machine learning applied to music creation. Together with practitioners, we develop and use several applications of machine learning for music creation, and present a public concert of the results. We reflect on the entire experience to arrive at several ways of advancing these and similar applications of machine learning to music creation.

show abstract

About Dynamic Processing in Mainstream Music

Deruty¹,

Tardieu²

2014

J. Audio Eng. Soc.

View full text Add to dashboard Cite

Sound Indexing Using Morphological Description

Peeters

Deruty

2010

IEEE Trans. Audio Speech Lang. Process.

View full text Add to dashboard Cite

Abstract-Sound sample indexing usually deals with the recognition of the source/cause that has produced the sound. For abstract sounds, sound-effects, unnatural or synthetic sounds this cause is usually unknown or unrecognizable. An efficient description of these sounds has been proposed by Schaeffer under the name morphological description. Part of this description consists in describing a sound by identifying the temporal evolution of its acoustic properties to a set of profiles. In this work, we consider three morphological descriptions: dynamic profiles (ascending, descending, ascending/descending, stable, impulsive), melodic profiles (up, down, stable, up/down, down/up) and complex-iterative sound description (non-iterative, iterative, grain, repetition). We study the automatic indexing of a sound into these profiles. Because this automatic indexing is difficult using standard audio features, we propose new audio features to perform this task. The dynamic profiles are estimated by modeling the loudness over-time of a sound by a second-order B-spline model and derive features from this model. The melodic profiles are estimated by tracking over time the perceptual filter which has the maximum excitation. A function is derived from this track which is then modeled using a second-order B-spline model. The features are again derived from the B-spline model. The description of complex-iterative sounds is obtained by estimating the amount of repetition and the period of the repetition. These are obtained by computing an audio similarity function derived from an MFCC similarity matrix. The proposed audio features are then tested for automatic classification. We consider three classification tasks corresponding to the three profiles. In each case, the results are compared with the ones obtained using standard audio features.

show abstract

Automatic morphological description of sounds

Peeters

Deruty

2008

View full text Add to dashboard Cite

Morphological description has been proposed by Pierre Schaeffer. It consists in describing sounds by identifying the temporal evolution of their acoustical properties to a set of profiles. This kind of description is especially useful for indexing sounds with unknown cause such as SoundFX. The present work deals with the automatic estimation of this morphological description from audio signal analysis. In this work, three morphological descriptions are considered: - dynamic profiles (ascending, descending, ascending/descending, stable, impulsive), - melodic profiles (asc., desc. fixed, up/down, down/up) - repetition profiles. For each case we present the most appropriate audio features (loudness, pitch, pitch salience, temporal increase/decrease, lag-matrix-periodicity, ...) and mapping algorithm (slope computed from spline approximations of temporal profiles, ...) used to automatically estimate the profiles. We demonstrate the use of these descriptions for automatic indexing (using decision trees) and search-by-similarity of SoundFX.

show abstract

BassNet: A Variational Gated Autoencoder for Conditional Generation of Bass Guitar Tracks with Learned Interactive Control

Grachten¹,

Lattner

Deruty

2020

Applied Sciences

View full text Add to dashboard Cite

Deep learning has given AI-based methods for music creation a boost by over the past years. An important challenge in this field is to balance user control and autonomy in music generation systems. In this work, we present BassNet, a deep learning model for generating bass guitar tracks based on musical source material. An innovative aspect of our work is that the model is trained to learn a temporally stable two-dimensional latent space variable that offers interactive user control. We empirically show that the model can disentangle bass patterns that require sensitivity to harmony, instrument timbre, and rhythm. An ablation study reveals that this capability is because of the temporal stability constraint on latent space trajectories during training. We also demonstrate that models that are trained on pop/rock music learn a latent space that offers control over the diatonic characteristics of the output, among other things. Lastly, we present and discuss generated bass tracks for three different music fragments. The work that is presented here is a step toward the integration of AI-based technology in the workflow of musical content creators.

show abstract

System & Contrast

Bimbot

Deruty

Sargent³

2016

View full text Add to dashboard Cite

This article introduces a model called " System & Contrast" (S&C), which aims at describing the inner organization of structural segments within music pieces in terms of : (i) a carrier system, i.e. a sequence of morphological elements forming a multi-dimensional network of self-deducible syntagmatic relationships and (ii) a contrast, i.e. a substitutive element, usually the last one, which partly departs from the logic implied by the rest of the system. With a primary focus on pop music, the S&C model provides a framework to describe internal implication patterns in musical segments by encoding similarities and relations between its constitutive elements so as to minimize the complexity of the resulting description. It is applicable at several timescales and to a wide variety of musical dimensions in a polymorphous way, therefore offering an attractive meta-description of different types of musical contents. It has been used as a central component in the creation of a set of annotations for 380 pop songs (Bimbot, Sargent, Deruty, Guichaoua & Vincent, 2014).This article formalizes the S&C model, illustrates how it applies to music and establishes its filiation with Narmour's Implication-Realization model (Narmour 1990(Narmour , 1992

show abstract

On the Development and Practice of AI Technology for Contemporary Popular Music Production

Deruty

Grachten

Lattner

et al. 2022

View full text Add to dashboard Cite

Although the use of AI technology for music production is still in its infancy, it has the potential to make a lasting impact on the way we produce music. In this paper we focus on the design and use of AI music tools for the production of contemporary Popular Music, in particular genres involving studio technology as part of the creative process. First we discuss how music production practices associated with those genres can differ significantly from traditional views of how a musical work is created, and how this affects AI music technology. We argue that-given the role of symbolic representations in this context, as well as the integration of composition activities with editing and mixing-audio-based AI tools are better suited to support the artist's creative workflow than purely piano-roll/MIDI-based tools. Then we give a report of collaborations with professional artists, in which we look at how various AI tools are used in practice to produce music. We identify usage patterns as well as issues and challenges that arise in practical use of the tools. Based on this we formulate some recommendations and validation criteria for the development of AI technology for contemporary Popular Music.

show abstract

Human–Made Rock Mixes Feature Tight Relations Between Spectrum and Loudness (JAES Volume 62 Issue 10 pp. 643-653; October 2014)

Deruty¹,

François²,

Roy³

2014

J. Audio Eng. Soc.

View full text Add to dashboard Cite

The tremendous success of rock music in the second half of the 20th century has boosted the sophistication of production and mixing techniques for this music genre. However, there is no unified theory of mixing from the viewpoint of sound engineering. In this paper, we highlight relationships between loudness and spectrum in individual tracks, established during the process of mixing. To do so, we introduce an ad hoc, three-dimensional model of the spectrum of a track. These dimensions are derived from an optimal monitoring level, that is, the level that optimizes the number of frequency bands at the same, maximum loudness. We study a corpus of 55 rock multi-tracks and correlate the model with the loudness of the tracks. We suggest that (1) at high monitoring levels and/or on high-end monitors, track loudness is a linear function of its spectral centroid, and (2) at low monitoring levels and/or on budget monitors, a track's optimal monitoring level is a linear function of its loudness. This indicates that under good listening conditions, human mixers tend to focus on spectral balance, whereas under bad conditions, they favor individual track comprehension. We discuss the implication of our results for automatic mixing.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Emmanuel Deruty

Machine learning research that matters for music creation: A case study

About Dynamic Processing in Mainstream Music

Sound Indexing Using Morphological Description

Automatic morphological description of sounds

BassNet: A Variational Gated Autoencoder for Conditional Generation of Bass Guitar Tracks with Learned Interactive Control

System & Contrast

On the Development and Practice of AI Technology for Contemporary Popular Music Production

Human–Made Rock Mixes Feature Tight Relations Between Spectrum and Loudness (JAES Volume 62 Issue 10 pp. 643-653; October 2014)

Contact Info

Product

Resources

About