Generating Albums with SampleRNN to Imitate Metal, Rock, and Punk Bands

Carr, C. Jelleff; Zukowski, Zack

doi:10.48550/arxiv.1811.06633

Cited by 4 publications

(7 citation statements)

References 2 publications

(4 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Audio synthesis technologies for music have been researched for many years, ranging from synthesizers generating pitched waveforms, to singing voice synthesizers conditioned on melody and text, 3 to deep learning based models capable of generating entire songs [5,7]. For the context of this paper, we restrict ourselves to a description of generative models as deep learning based architectures used for musical audio synthesis.…”

Section: Related Workmentioning

confidence: 99%

“…The use of dilated causal convolutions allows the architecture to model longer term temporal dependencies between samples in an audio waveform than in the SampleRNN architecture. This architecture has subsequently been adapted for musical generation like singing voice synthesis conditioned on lyrics [11], and instrument sound generation conditioned on the pitch and latent representations of timbre [3,5,9]. While the output of these models is subjectively similar to natural-sounding samples, the sequential nature of the model means that the processing time for generation is quite high, unless high-resource processing units are available.…”

Section: Related Workmentioning

confidence: 99%

“…In parallel, recent advancements in generative deep learning based methodologies have sparked a huge interest in music generation. Research has focused on both the symbolic [1,2] and the audio domain [3][4][5][6][7][8][9]. However, except for singing voice synthesizers, where the parameters of synthesis can be clearly defined in terms This work is partially supported by the Towards Richer Online Music Public-domain Archives (TROMPA) project and partly by the European Union's Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No765068, MIP-Frontiers.…”

Section: Introductionmentioning

confidence: 99%

“…of the lyrics and score [10][11][12], most of the proposed generative models for music rely on abstract latent embeddings for controlling the synthesis [4,5,[7][8][9]. Efforts have been made to find semantic structures within the latent embeddings that can provide an intuitive control to the music producer, but there remains a gap between the intuition behind the parameters of the generative model and the perceptual qualities of the audio generated.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Loopnet: Musical Loop Synthesis Conditioned on Intuitive Musical Parameters

Chandna

Ramires

Serra

2021

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

Loops, seamlessly repeatable musical segments, are a cornerstone of modern music production. Contemporary artists often mix and match various sampled or pre-recorded loops based on musical criteria such as rhythm, harmony and timbral texture to create compositions. Taking such criteria into account, we present LoopNet, a feed-forward generative model for creating loops conditioned on intuitive parameters. We leverage Music Information Retrieval (MIR) models as well as a large collection of public loop samples in our study and use the Wave-U-Net architecture to map control parameters to audio. We also evaluate the quality of the generated audio and propose intuitive controls for composers to map the ideas in their minds to an audio loop.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Loopnet: Musical Loop Synthesis Conditioned on Intuitive Musical Parameters

Chandna

Ramires

Serra

2021

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

show abstract

“…D EEP learning models have a good ability to deal with challenging problems that are too complex for us to explain by means of simple and deterministic laws in closed forms. Some examples include the extraction of relevant information from images [1], image inpainting and denoising [2], natural language processing [3], the creation of music [4], and learning how to play a 3D role-playing game properly [5].…”

Section: Introductionmentioning

confidence: 99%

Neural Network Layer Algebra: A Framework to Measure Capacity and Compression in Deep Learning

Badías¹,

Banerjee²

2021

Preprint

View full text Add to dashboard Cite

We present a new framework to measure the intrinsic properties of (deep) neural networks. While we focus on convolutional networks, our framework can be extrapolated to any network architecture. In particular, we evaluate two network properties, namely, capacity (related to expressivity) and compression, both of which depend only on the network structure and are independent of the training and test data. To this end, we propose two metrics: the first one, called layer complexity, captures the architectural complexity of any network layer; and, the second one, called layer intrinsic power, encodes how data is compressed along the network. The metrics are based on the concept of layer algebra, which is also introduced in this paper. This concept is based on the idea that the global properties depend on the network topology, and the leaf nodes of any neural network can be approximated using local transfer functions, thereby, allowing a simple computation of the global metrics. We also compare the properties of the state-of-the art architectures using our metrics and use the properties to analyze the classification accuracy on benchmark datasets.

show abstract

A Comprehensive Survey on Deep Music Generation: Multi-level Representations, Algorithms, Evaluations, and Future Directions

Ji,

Luo,

Yang

2020

Preprint

View full text Add to dashboard Cite

The utilization of deep learning techniques in generating various contents (such as image, text, etc.) has become a trend. Especially music, the topic of this paper, has attracted widespread attention of countless researchers.The whole process of producing music can be divided into three stages, corresponding to the three levels of music generation: score generation produces scores, performance generation adds performance characteristics to the scores, and audio generation converts scores with performance characteristics into audio by assigning timbre or generates music in audio format directly. Previous surveys have explored the network models employed in the field of automatic music generation. However, the development history, the model evolution, as well as the pros and cons of same music generation task have not been clearly illustrated. This paper attempts to provide an overview of various composition tasks under different music generation levels, covering most of the currently popular music generation tasks using deep learning. In addition, we summarize the datasets suitable for diverse tasks, discuss the music representations, the evaluation methods as well as the challenges under different levels, and finally point out several future directions.

show abstract

Generating Albums with SampleRNN to Imitate Metal, Rock, and Punk Bands

Cited by 4 publications

References 2 publications

Loopnet: Musical Loop Synthesis Conditioned on Intuitive Musical Parameters

Loopnet: Musical Loop Synthesis Conditioned on Intuitive Musical Parameters

Neural Network Layer Algebra: A Framework to Measure Capacity and Compression in Deep Learning

A Comprehensive Survey on Deep Music Generation: Multi-level Representations, Algorithms, Evaluations, and Future Directions

Contact Info

Product

Resources

About