2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2012
DOI: 10.1109/icassp.2012.6289140
|View full text |Cite
|
Sign up to set email alerts
|

VocaListener and VocaWatcher: Imitating a human singer by using signal processing

Abstract: In this paper, we describe three singing information processing systems, VocaListener, VocaListener2, and VocaWatcher, that imitate singing expressions of the voice and face of a human singer. VocaListener can synthesize natural singing voices by analyzing and imitating the pitch and dynamics of the human singing. VocaListener2 imitates temporal timbre changes in addition to the pitch and dynamics. In synchronization with the synthesized singing voices, VocaWatcher can generate realistic facial motions of a hu… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Year Published

2012
2012
2017
2017

Publication Types

Select...
4
2

Relationship

1
5

Authors

Journals

citations
Cited by 9 publications
(7 citation statements)
references
References 19 publications
(21 reference statements)
0
7
0
Order By: Relevance
“…Correlations between different acoustic parameters and listeners' perception of emotions in singing voice (as well as in music in general) can also be studied by investigating the listeners' emotion judgments of sounds that had each of different parameters systematically and independently manipulated (Scherer and Oshinsky, 1977;Kotlyar and Morozov, 1976). Procedures of synthesis and resynthesizes have also been used to systematically manipulate acoustics parameters, in order to investigate the effects and relevance of each of the parameters for listeners emotion judgment (e.g., Goto et al, 2012;Fonseca, 2011;Kenmochi and Ohshita, 2007;Risset, 1991;Sundberg, 1978). A comparison between acoustic patterns that characterizes both expressive speech and expressive singing suggests a striking parallel between the expression of emotions in the speaking and the singing voice between.…”
Section: Introductionmentioning
confidence: 99%
“…Correlations between different acoustic parameters and listeners' perception of emotions in singing voice (as well as in music in general) can also be studied by investigating the listeners' emotion judgments of sounds that had each of different parameters systematically and independently manipulated (Scherer and Oshinsky, 1977;Kotlyar and Morozov, 1976). Procedures of synthesis and resynthesizes have also been used to systematically manipulate acoustics parameters, in order to investigate the effects and relevance of each of the parameters for listeners emotion judgment (e.g., Goto et al, 2012;Fonseca, 2011;Kenmochi and Ohshita, 2007;Risset, 1991;Sundberg, 1978). A comparison between acoustic patterns that characterizes both expressive speech and expressive singing suggests a striking parallel between the expression of emotions in the speaking and the singing voice between.…”
Section: Introductionmentioning
confidence: 99%
“…In addition to audio timbre distance metrics, such a system could involve video to estimate bow velocity or even bow-bridge distance. This type of approach was used by VocaListener and VocaWatcher [13] to create performances with Vocaloid singing synthesis [17] and a humanoid robot which matched a recording of a human singer.…”
Section: Discussion and Future Workmentioning
confidence: 99%
“…Such a system is Vocalistener2 and it is able to synthesize a singing voice by taking a human voice as input, analyzing its pitch, dynamics, timbre shifts and phonemes and creates a singing voice based on the timbre changes of a user's singing voice. In a recent publication the authors presented VocaWatcher that generates realistic facial expressions of a human singer and controls a humanoid robot's face [10]. This is one of the first systems we identified that focuses on imitating a real singer based on acoustic and visual expressions.…”
Section: Related Workmentioning
confidence: 96%