2020
DOI: 10.1111/cgf.13946
|View full text |Cite
|
Sign up to set email alerts
|

Style‐Controllable Speech‐Driven Gesture Synthesis Using Normalising Flows

Abstract: Automatic synthesis of realistic gestures promises to transform the fields of animation, avatars and communicative agents. In off‐line applications, novel tools can alter the role of an animator to that of a director, who provides only high‐level input for the desired animation; a learned network then translates these instructions into an appropriate sequence of body poses. In interactive scenarios, systems for generating natural animations on the fly are key to achieving believable and relatable characters. I… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
126
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
3
2
2

Relationship

1
6

Authors

Journals

citations
Cited by 118 publications
(126 citation statements)
references
References 37 publications
0
126
0
Order By: Relevance
“…Future work also involves making the model stochastic (as in [2]), using larger datasets (such as [29]) and further improving the semantic coherence of the gestures, for instance by treating different gesture types separately.…”
Section: Discussionmentioning
confidence: 99%
See 2 more Smart Citations
“…Future work also involves making the model stochastic (as in [2]), using larger datasets (such as [29]) and further improving the semantic coherence of the gestures, for instance by treating different gesture types separately.…”
Section: Discussionmentioning
confidence: 99%
“…The recurrent connections used in several models [13,19,48] can also act as a pose memory that may help the model to produce smooth output motion. Autoregressive motion models have recently demonstrated promising results in probabilistic audio-driven gesture generation [2]. In this paper, we similarly investigate autoregressive connections for improving motion quality, which explicitly provide the most recent poses as input to the model when generating the next pose.…”
Section: Regarding Motion Continuitymentioning
confidence: 99%
See 1 more Smart Citation
“…In recent work, recurrent neural networks have proven popular; a classic training loss has been employed for English [12,21] and Japanese speech-to-gesture generation [18,20]. To combat the problem of mean pose regression in a standard training paradigm, an adversarial training paradigm has been proposed in [14] (similarly for a convolutional network setup in [15]), and recently, probabilistic generative modelling has shown promise [1]. However, due to the highly indeterministic input-to-output relation, modelling plausible gestures remains a difficult problem.…”
Section: Related Workmentioning
confidence: 99%
“…(3.1) path length (3.2) major axis length (4) arm swivel (5) hand opening Velocity and initial acceleration both describe the kinematics of the gesture, represented by the maximum stroke velocity (1), and by the mean acceleration to the first major velocity peak (2). Velocity captures a character's tempo and relates to the amount of energy they are using.…”
Section: Gesture Processingmentioning
confidence: 99%