ClaviNet: Generate Music With Different Musical Styles

Lim, Yu-Quan; Chan, Chee Seng; Loo, Fung Ying

doi:10.1109/mmul.2020.3046491

Cited by 8 publications

(2 citation statements)

References 3 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The literature [12] analyzed the method of selecting the aesthetic value of music singing based on the perspective of new media, and the new media technology combined with online media and mobile media can realize the reform of music teaching and complete the classroom arrangement of singing as well as let students master the scientific way of vocal music. Literature [13] proposed continuous style embedding through the general formulation of variational self-encoder, compared two different methods of z integration into VAE, combined with a deep learning model to control the training of the dataset for generating musical styles and better music samples using a baseline model of discrete style labels. In the literature [14], a system of vocal aerodynamics was used in a study to determine the relationship between the vertical phase difference and the vocal gate efficiency of musical theater singers.…”

Section: Literature Reviewmentioning

confidence: 99%

Analysis of the use of pop singing in musical theater singing based on data analysis

2023

Applied Mathematics and Nonlinear Sciences

View full text Add to dashboard Cite

In this paper, we first systematically sorted out the characteristics of pop singing and musical theater singing and explored the generation of emotion and melody application of pop singing in musical theater singing. Then, the gray GM(1,1) model is analyzed, and a gray data mining model based on data analysis is constructed to predict the application of pop singing in combination with the data mining model under data analysis. Finally, the prediction analysis was conducted to use characterization, timbre and style in pop singing, respectively. The results showed that the score of character building was 90, and the errors between the predicted and actual values of timbre and style were between ±0.1 and ±1, respectively, which were within an acceptable range.

show abstract

Section: Literature Reviewmentioning

confidence: 99%

Analysis of the use of pop singing in musical theater singing based on data analysis

2023

Applied Mathematics and Nonlinear Sciences

View full text Add to dashboard Cite

show abstract

“…In total, our vocabulary for the piano contains 730 unique tokens. 8 The songs in our dataset have ∼95 bars on average, which translate to 5,249 tokens per song on average using the piano representation. Accordingly, a 512-token sequence employed in model training (i.e., x 1:T ) contains about nine bars on average.…”

Section: B Token Representationmentioning

confidence: 99%

Theme Transformer: Symbolic Music Generation with Theme-Conditioned Transformer

Shih¹,

Wu²,

Zalkow³

et al. 2021

Preprint

View full text Add to dashboard Cite

Attention-based Transformer models have been increasingly employed for automatic music generation. To condition the generation process of such a model with a user-specified sequence, a popular approach is to take that conditioning sequence as a priming sequence and ask a Transformer decoder to generate a continuation. However, this prompt-based conditioning cannot guarantee that the conditioning sequence would develop or even simply repeat itself in the generated continuation. In this paper, we propose an alternative conditioning approach, called theme-based conditioning, that explicitly trains the Transformer to treat the conditioning sequence as a thematic material that has to manifest itself multiple times in its generation result. This is achieved with two main technical contributions. First, we propose a deep learning-based approach that uses contrastive representation learning and clustering to automatically retrieve thematic materials from music pieces in the training data. Second, we propose a novel gated parallel attention module to be used in a sequence-to-sequence (seq2seq) encoder/decoder architecture to more effectively account for a given conditioning thematic material in the generation process of the Transformer decoder. We report on objective and subjective evaluations of variants of the proposed Theme Transformer and the conventional promptbased baseline, showing that our best model can generate, to some extent, polyphonic pop piano music with repetition and plausible variations of a given condition.

show abstract