2015
DOI: 10.1109/msp.2015.2424572
|View full text |Cite
|
Sign up to set email alerts
|

Expression Control in Singing Voice Synthesis: Features, approaches, evaluation, and challenges

Abstract: In the context of singing voice synthesis, expression control manipulates a set of voice features related to a particular emotion, style, or singer. Also known as performance modeling, it has been approached from different perspectives and for different purposes, and different projects have shown a wide extent of applicability. The aim of this article is to provide an overview of approaches to expression control in singing voice synthesis. Section I introduces some musical applications that use singing voice s… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
30
0

Year Published

2017
2017
2020
2020

Publication Types

Select...
5
2
1

Relationship

2
6

Authors

Journals

citations
Cited by 33 publications
(31 citation statements)
references
References 26 publications
0
30
0
Order By: Relevance
“…We list errors for note onsets, offsets and consonant durations separately to ensure the fitting heuristic affects the results only minimally. • F0 metrics: Standard F0 metrics such as RMSE are given, but it should be noted that these metrics are often not very correlated to perceptual metrics in singing [41]. For instance, starting a vibrato slightly early or late compared to the reference may be equally valid musically, but can the cause the two F0 contours to become out of phase, resulting in high distances.…”
mentioning
confidence: 99%
“…We list errors for note onsets, offsets and consonant durations separately to ensure the fitting heuristic affects the results only minimally. • F0 metrics: Standard F0 metrics such as RMSE are given, but it should be noted that these metrics are often not very correlated to perceptual metrics in singing [41]. For instance, starting a vibrato slightly early or late compared to the reference may be equally valid musically, but can the cause the two F0 contours to become out of phase, resulting in high distances.…”
mentioning
confidence: 99%
“…Expression control in singing synthesis, also known as performance modelling, consists in the manipulation of a set of voice features (e.g., phonetic timing, pitch contour, vibrato, timbre) that relates to a particular emotion, style, or singer [41]. Accordingly, the expression control generation module provides the duration, F0, and spectral controls required by the transformation module to convert the sequence of speech parameters into singing parameters.…”
Section: Expression Control Generationmentioning
confidence: 99%
“…25. Note that as in natural voices, the vowel identity tends to disappear for high pitch, with all vowels becoming close to each other [sound example in Additional files 7 and 8] 9 .…”
Section: First Formant Tuningmentioning
confidence: 99%
“…The main advantage of parametric synthesis is its flexibility and economy in terms of memory and computational load. The next generation of voice synthesis systems was based on recording, concatenation, and modification of real voice samples 4 or statistical parametric synthesis [9]. A formant synthesizer is preferred for Cantor Digitalis because flexibility and real time are the main issues for performative singing synthesis.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation