Proceedings of the 28th ACM International Conference on Multimedia 2020
DOI: 10.1145/3394171.3413519
|View full text |Cite
|
Sign up to set email alerts
|

Drum Synthesis and Rhythmic Transformation with Adversarial Autoencoders

Abstract: Creative rhythmic transformations of musical audio refer to automated methods for manipulation of temporally-relevant sounds in time. This paper presents a method for joint synthesis and rhythm transformation of drum sounds through the use of adversarial autoencoders (AAE). Users may navigate both the timbre and rhythm of drum patterns in audio recordings through expressive control over a low-dimensional latent space. The model is based on an AAE with Gaussian mixture latent distributions that introduce rhythm… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
4
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(4 citation statements)
references
References 10 publications
0
4
0
Order By: Relevance
“…Latent variable models such as Generative Adversarial Networks (GANs) (Goodfellow, 2017), or Variational Auto-Encoders (VAEs) (Kingma & Welling, 2014), have been more widely used in this sense as they are faster and allow manipulating learned controls affecting high-level factors of variation in the generated data (Caillon & Esling, 2021;Aouameur et al, 2019;Engel et al, 2019). Specifically, GANs have shown promising results in drum sound synthesis (Nistal et al, 2020;Drysdale et al, 2021;Tomczak et al, 2020) and are generally superior to other generative methods in terms of speed and quality. Recently, Denoising Diffusion models have shown results on par with GANs (Ho et al, 2020) and were applied to drum sound synthesis obtaining unprecedented audio quality and diversity (Rouard & Hadjeres, 2021).…”
Section: Previous Workmentioning
confidence: 99%
“…Latent variable models such as Generative Adversarial Networks (GANs) (Goodfellow, 2017), or Variational Auto-Encoders (VAEs) (Kingma & Welling, 2014), have been more widely used in this sense as they are faster and allow manipulating learned controls affecting high-level factors of variation in the generated data (Caillon & Esling, 2021;Aouameur et al, 2019;Engel et al, 2019). Specifically, GANs have shown promising results in drum sound synthesis (Nistal et al, 2020;Drysdale et al, 2021;Tomczak et al, 2020) and are generally superior to other generative methods in terms of speed and quality. Recently, Denoising Diffusion models have shown results on par with GANs (Ho et al, 2020) and were applied to drum sound synthesis obtaining unprecedented audio quality and diversity (Rouard & Hadjeres, 2021).…”
Section: Previous Workmentioning
confidence: 99%
“…Related tasks that have been attempted in the literature with deep learning include symbolic-domain generation of a monophonic drum track (i.e., kick drum only) of multiple bars [4], symbolic-domain drum pattern generation [22][23][24][25], symbolic-domain drum track generation as part of a multi-track MIDI [26][27][28][29], audio-domain one-shot drum hit generation [30][31][32][33][34], audio-domain generation of drum sounds of an entire drum kit of a single bar [35], and audio-domain drum loop generation [36]. Jukebox [7] generates a mixture of sounds that include drums, but not an isolated drum track.…”
Section: Related Work On Drum Generationmentioning
confidence: 99%
“…The calculations of the E S envelopes is the same for E Ŝ . Following [43], envelope reconstruction of the transformations is evaluated with co-sine similarity calculated between envelopes extracted from target and transformed recordings as follows:…”
Section: Timbral Reconstructionmentioning
confidence: 99%