2021
DOI: 10.1002/cav.2016
|View full text |Cite
|
Sign up to set email alerts
|

ExpressGesture: Expressive gesture generation from speech through database matching

Abstract: Co-speech gestures are a vital ingredient in making virtual agents more human-like and engaging. Automatically generated gestures based on speech-input often lack realistic and defined gesture form. We present a database-driven approach guaranteeing defined gesture form. We built a large corpus of over 23,000 motion-captured co-speech gestures and select individual gestures based on expressive gesture characteristics that can be estimated from speech audio. The expressive parameters are gesture velocity and ac… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
8
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
3
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 25 publications
(8 citation statements)
references
References 25 publications
0
8
0
Order By: Relevance
“…beat gestures) in context of acoustic prosody has been studied heavily since the era of rule based gesture synthesis [CMM99, MXL * 13]. Fast forward to approaches with data-driven synthesis, some explicitly rely on extracted prosodic features [FNM21], while others [GBK * 19,ALNM20] learn implicit embeddings from acoustics which prosody is one of the key components. It seems clear that gesture production must be grounded in the rhythm of audio data, and appropriate beat gestures will be challenging to achieve from text transcriptions alone, without timing information [KNN * 22].…”
Section: Multimodal Groundingmentioning
confidence: 99%
“…beat gestures) in context of acoustic prosody has been studied heavily since the era of rule based gesture synthesis [CMM99, MXL * 13]. Fast forward to approaches with data-driven synthesis, some explicitly rely on extracted prosodic features [FNM21], while others [GBK * 19,ALNM20] learn implicit embeddings from acoustics which prosody is one of the key components. It seems clear that gesture production must be grounded in the rhythm of audio data, and appropriate beat gestures will be challenging to achieve from text transcriptions alone, without timing information [KNN * 22].…”
Section: Multimodal Groundingmentioning
confidence: 99%
“…beat gestures) in context of acoustic prosody has been studied heavily since the era of rule based gesture synthesis [CMM99, MXL*13]. Fast forward to approaches with dataā€driven synthesis, some explicitly rely on extracted prosodic features [FNM21], while others [GBK*19, ALNM20] learn implicit embeddings from acoustics which prosody is one of the key components. It seems clear that gesture production must be grounded in the rhythm of audio data, and appropriate beat gestures will be challenging to achieve from text transcriptions alone, without timing information [KNN*22].…”
Section: Key Challenges Of Gesture Generationmentioning
confidence: 99%
“…For example, the work of Zhuang et al [76] uses a transformer-based encoder-decoder for face animation and a motion graph retrieval module for body animation. Another example is the work of Ferstl et al [19], who generates parameters such as acceleration or velocity of motion from the audio, before finding a corresponding motion in a database.…”
Section: Data-driven Approachesmentioning
confidence: 99%