2018
DOI: 10.48550/arxiv.1811.06633
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Generating Albums with SampleRNN to Imitate Metal, Rock, and Punk Bands

Abstract: This early example of neural synthesis is a proof-ofconcept for how machine learning can drive new types of music software. Creating music can be as simple as specifying a set of music influences on which a model trains. We demonstrate a method for generating albums that imitate bands in experimental music genres previously unrealized by traditional synthesis techniques (e.g. additive, subtractive, FM, granular, concatenative). Raw audio is generated autoregressively in the timedomain using an unconditional Sa… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
7
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
3
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(7 citation statements)
references
References 2 publications
(4 reference statements)
0
7
0
Order By: Relevance
“…Audio synthesis technologies for music have been researched for many years, ranging from synthesizers generating pitched waveforms, to singing voice synthesizers conditioned on melody and text, 3 to deep learning based models capable of generating entire songs [5,7]. For the context of this paper, we restrict ourselves to a description of generative models as deep learning based architectures used for musical audio synthesis.…”
Section: Related Workmentioning
confidence: 99%
See 3 more Smart Citations
“…Audio synthesis technologies for music have been researched for many years, ranging from synthesizers generating pitched waveforms, to singing voice synthesizers conditioned on melody and text, 3 to deep learning based models capable of generating entire songs [5,7]. For the context of this paper, we restrict ourselves to a description of generative models as deep learning based architectures used for musical audio synthesis.…”
Section: Related Workmentioning
confidence: 99%
“…The use of dilated causal convolutions allows the architecture to model longer term temporal dependencies between samples in an audio waveform than in the SampleRNN architecture. This architecture has subsequently been adapted for musical generation like singing voice synthesis conditioned on lyrics [11], and instrument sound generation conditioned on the pitch and latent representations of timbre [3,5,9]. While the output of these models is subjectively similar to natural-sounding samples, the sequential nature of the model means that the processing time for generation is quite high, unless high-resource processing units are available.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…D EEP learning models have a good ability to deal with challenging problems that are too complex for us to explain by means of simple and deterministic laws in closed forms. Some examples include the extraction of relevant information from images [1], image inpainting and denoising [2], natural language processing [3], the creation of music [4], and learning how to play a 3D role-playing game properly [5].…”
Section: Introductionmentioning
confidence: 99%