Kayoko Yanagisawa scite author profile

We present a novel system for singing synthesis, based on attention. Starting from a musical score with notes and lyrics, we build a phoneme-level multi stream note embedding. The embedding contains the information encoded in the score regarding pitch, duration and the phonemes to be pronounced on each note. This note representation is used to condition an attention-based sequence-to-sequence architecture, in order to generate mel-spectrograms. Our model demonstrates attention can be successfully applied to the singing synthesis field. The system requires considerably less explicit modelling of voice features such as F0 patterns, vibratos, and note and phoneme durations, than most models in the literature. However, we observe that completely dispensing with any duration modelling introduces occasional instabilities in the generated spectrograms. We train an autoregressive WaveNet to be used as a neural vocoder to synthesise the mel-spectrograms produced by the sequence-to-sequence architecture, using a combination of speech and singing data.

show abstract

Expressive visual text-to-speech as an assistive technology for individuals with autism spectrum conditions

Cassidy

Stenger

Dongen

et al. 2016

Computer Vision and Image Understanding

View full text Add to dashboard Cite

show abstract

Voice expression conversion with factorised HMM-TTS models

Latorre

Wan

Yanagisawa

2014

View full text Add to dashboard Cite

Crowdsourced Assessment of Speech Synthesis

Buchholz¹,

Latorre

Yanagisawa

2013

View full text Add to dashboard Cite

Building HMM-TTS Voices on Diverse Data

Wan

Latorre

Yanagisawa

et al. 2014

IEEE J. Sel. Top. Signal Process.

View full text Add to dashboard Cite

Creating New Voices using Normalizing Flows

Biliński¹,

Merritt²,

Ezzerg³

et al. 2022

View full text Add to dashboard Cite

Singing Synthesis: with a little help from my attention

Angelini¹,

Moinet²,

Yanagisawa³

et al. 2019

Preprint

View full text Add to dashboard Cite

Speech intonation for TTS: study on evaluation methodology

Latorre

Yanagisawa

Wan

et al. 2014

View full text Add to dashboard Cite

12 3

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

334 Leonard St

Brooklyn, NY 11211

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.