Bertsokantari: a TTS Based Singing Synthesis System

Blanco, Eder del; Hernáez, Inmaculada; Navas, Eva; Sarasola, Xabier; Erro, Daniel

doi:10.21437/interspeech.2016-1123

Cited by 2 publications

(4 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Table 7.1.2 summarizes the languages, numbers of submitted songs, voice genders and participating labs. For a detailed description of each system, the reader is referred to [49]: the WBHSM concatenative synthesizer (UPF, Barcelona) [16], ISIS, the Ircam Singing Synthesizer (Paris) ) [52], the Seraphim system (A*STAR, Singapore) [53], the Bertsokantari system (UPV, Bilbao) [54], the ACAPELA singing synthesis system (Mons) [55], and Calliphony, an earlier implementation of C-Voks. For the sake of simplicity, the system is coined C-Voks.…”

Section: Participant To the Challenge And Test Methodologymentioning

confidence: 99%

Voks: Digital instruments for chironomic control of voice samples

Locqueville

d’Alessandro

Delalez³

et al. 2020

Speech Communication

View full text Add to dashboard Cite

This paper presents Voks, a new family of digital instruments that allow for real-time control and modification of pre-recorded voice signal samples. An instrument based on Voks is made of Voks itself, the synthesis software and a given set of chironomic (hand-driven) interfaces. Rhythm can be accurately controlled thanks to a new methodology, based on syllabic control points. Timing can also be controlled with other methods, including scrubbing and playback speed variation. Pitch, vocal effort, voice tension, apparent vocal tract size, voicing ratio, aperiodicity ratio of the voice samples can be modified thanks to a real-time high-quality vocoder. Different forms of chironomic control of the vocal parameters are proposed. Pitch is controlled by continuous hand motions using a stylus on a surface (C-Voks) or a theremin (T-Voks). Other interfaces can be used as well. Syllabic rhythm is controlled using a biphasic button. Scrubbing, playback speed and timbre related parameters can be controlled using the theremin, control surfaces or continuous controllers like faders. In addition to realistic imitation of speaking or singing voices, other playing modes yield new interesting sounds. Voks participated in comparative perceptual evaluation of singing synthesis systems. It has been demonstrated in a live musical settings, using different control interfaces. In addition to musical or poetic performances, applications of performative vocal synthesis to language learning and speech reeducation are foreseen.

show abstract

Section: Participant To the Challenge And Test Methodologymentioning

confidence: 99%

Voks: Digital instruments for chironomic control of voice samples

Locqueville

d’Alessandro

Delalez³

et al. 2020

Speech Communication

View full text Add to dashboard Cite

show abstract

“…For instance, in [22], the synthetic speech was converted into singing according to a MIDI file input, using STRAIGHT to perform the analysis, transformation and synthesis. In [17], an HMM-based TTS synthesiser for Basque was used to generate a singing voice. The parameters provided by the TTS system for the spoken version of the lyrics were modified to adapt them to the requirements of the score.…”

Section: Singing Synthesismentioning

confidence: 99%

“…The audios generated for one of the five scores have been provided as Additional files 1, 2, 3, 4, 5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23, and 24. Forty-nine Spanish native speakers took part in the test.…”

Section: Subjective Evaluation 431 Mushra Test Setupmentioning

confidence: 99%

“…In order to add singing capabilities to a corpus-based TTS system, the first idea that may come to mind is to incorporate a supplementary singing database. However, occasional singing needs do not justify the cost of building an additional corpus, which may become unfeasible if the original speaker is unavailable or unable to sing properly [17,18]. As an alternative, we could take advantage of those approaches which focus on the production of singing from speech following the so-called speechto-singing (STS) conversion [19][20][21].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

A unit selection text-to-speech-and-singing synthesis framework from neutral speech: proof of concept

Freixes

Álías

Socoró

2019

J AUDIO SPEECH MUSIC PROC.

View full text Add to dashboard Cite

Text-to-speech (TTS) synthesis systems have been widely used in general-purpose applications based on the generation of speech. Nonetheless, there are some domains, such as storytelling or voice output aid devices, which may also require singing. To enable a corpus-based TTS system to sing, a supplementary singing database should be recorded. This solution, however, might be too costly for eventual singing needs, or even unfeasible if the original speaker is unavailable or unable to sing properly. This work introduces a unit selection-based text-to-speech-and-singing (US-TTS&S) synthesis framework, which integrates speech-to-singing (STS) conversion to enable the generation of both speech and singing from an input text and a score, respectively, using the same neutral speech corpus. The viability of the proposal is evaluated considering three vocal ranges and two tempos on a proof-of-concept implementation using a 2.6-h Spanish neutral speech corpus. The experiments show that challenging STS transformation factors are required to sing beyond the corpus vocal range and/or with notes longer than 150 ms. While score-driven US configurations allow the reduction of pitch-scale factors, timescale factors are not reduced due to the short length of the spoken vowels. Moreover, in the MUSHRA test, text-driven and score-driven US configurations obtain similar naturalness rates of around 40 for all the analysed scenarios. Although these naturalness scores are far from those of vocaloid, the singing scores of around 60 which were obtained validate that the framework could reasonably address eventual singing needs.

show abstract

Bertsokantari: a TTS Based Singing Synthesis System

Cited by 2 publications

References 16 publications

Voks: Digital instruments for chironomic control of voice samples

Voks: Digital instruments for chironomic control of voice samples

A unit selection text-to-speech-and-singing synthesis framework from neutral speech: proof of concept

Contact Info

Product

Resources

About