This paper presents a novel, data-driven language model that produces entire lyrics for a given input melody. Previously proposed models for lyrics generation suffer from the inability of capturing the relationship between lyrics and melody partly due to the unavailability of lyrics-melody aligned data. In this study, we first propose a new practical method for creating a large collection of lyrics-melody aligned data and then create a collection of 1,000 lyrics-melody pairs augmented with precise syllable-note alignments and word/sentence/paragraph boundaries. We then provide a quantitative analysis of the correlation between word/sentence/paragraph boundaries in lyrics and melodies. We then propose an RNN-based lyrics language model conditioned on a featurized melody. Experimental results show that the proposed model generates fluent lyrics while maintaining the compatibility between boundaries of lyrics and melody structures.
We present an algorithm for song composition using prosody of Japanese lyrics. Since Japanese is a "pitch accent" language, listener's apprehension is strongly affected by the pitch motions of the speaker. For example, the meaning of Japanese word "ha-shi" changes with the pitch. It means "bridge" with an upward pitch motion, and "chopsticks" with the motion inversed. A melody attached to the lyrics cause an effect similar to the pitch accent. Therefore we can assume that pitches of Japanese lyrics give constraints on pitch motions of the melody. Furthermore, chord progression, rhythm and accompaniment give constraints on the transitions and occurrences of the melody notes. If a certain melody for the lyrics were obtained, the melody would satisfy these constraints. Conversely, we can compose a song by finding the melody which optimally meets the condition.
Implementation and Experimental ResultsOrpheus is an automatic composition system that we implemented using melody composition algorithm based on prosody. This system computes melody from the lyrics input with choices of chord progressions, rhythm patterns, and accompaniment instruments. We used Galatea-Talk[4] text-to-speech engine to analyze the prosody of Japanese lyrics, and HMM singing voice synthesizer[5] to generate the vocal part. We also implemented the system as a web-based application 1 . We did two experiments to evaluate the system. Firstly, we asked a classical music composer to evaluate 59 generated songs in five-grade evaluation. Secondly, we uploaded our system to get comments from a large number of users on the internet. During a year of operation, about 56,000 songs were generated by the users and 1378 people answered the questions about Orpheus and the generated songs. The results are shown in Fig. 1 and Fig. 2. Judging from the results, about 70.8% commented that the generated songs are attractive, and 84.9% of the users had fun trying this system.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.