Proceedings of the 2020 International Conference on Multimodal Interaction 2020
DOI: 10.1145/3382507.3418815
|View full text |Cite
|
Sign up to set email alerts
|

Gesticulator: A framework for semantically-aware speech-driven gesture generation

Abstract: During speech, people spontaneously gesticulate, which plays a key role in conveying information. Similarly, realistic co-speech gestures are crucial to enable natural and smooth interactions with social agents. Current end-to-end co-speech gesture generation systems use a single modality for representing speech: either audio or text. These systems are therefore confined to producing either acoustically-linked beat gestures or semantically-linked gesticulation (e.g., raising a hand when saying "high"): they ca… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

4
115
2

Year Published

2020
2020
2022
2022

Publication Types

Select...
4
2
1

Relationship

1
6

Authors

Journals

citations
Cited by 119 publications
(121 citation statements)
references
References 37 publications
4
115
2
Order By: Relevance
“…Previous studies suggest that motion quality (human-likeness) may influence gesture appropriateness ratings in subjective evaluations [31,61]. Our experiments only partly managed to separate these two aspects of gesture perception.…”
Section: Discussion Of the Challenge Resultsmentioning
confidence: 75%
See 1 more Smart Citation
“…Previous studies suggest that motion quality (human-likeness) may influence gesture appropriateness ratings in subjective evaluations [31,61]. Our experiments only partly managed to separate these two aspects of gesture perception.…”
Section: Discussion Of the Challenge Resultsmentioning
confidence: 75%
“…The distance between speed histograms has also been used to evaluate gesture quality [29,31], since well-trained models should produce motion with similar properties to that of the actor it was trained on. In particular, it should have a similar motion-speed profile for any given joint.…”
Section: Comparing Speed Histogramsmentioning
confidence: 99%
“…Moving forward, neural networks were employed to predict a sequence of frames for gestures (Hasegawa et al, 2018), head motions (Sadoughi and Busso, 2018) and body motions (Shlizerman et al, 2018;Ahuja et al, 2019;Ginosar et al, 2019;Ferstl et al, 2019) conditioned on a speech input while Yoon et al (2019) uses only a text input. Unlike these approaches, Kucherenko et al (2020) rely on both speech and language for gesture generation. But their choice of early fusion to com-bine the modalities ignores multi-scale correlations (Tsai et al, 2019) between speech and language.…”
Section: Related Workmentioning
confidence: 99%
“…Expressivity Naturalness Relevance Timing S2G (Ginosar et al, 2019) 24.6 ± 3.1 22.1 ± 1.8 22.4 ± 1.7 27.6 ± 1.7 Gesticulator (Kucherenko et al, 2020) 31.9 ± 2.0 32.1 ± 1.7 31.4 ± 1.8 31.1 ± 1.7 Ours w/o G attn 35.0 ± 2.3 29.2 ± 1.7 30.9 ± 1.8 30.8 ± 1.7 Ours w/o AISLe 35.8 ± 2.9 35.7 ± 1.7 33.7 ± 1.7 32.1 ± 1.7 Ours 38.9 ± 1.7 36.7 ± 1.6 37.1 ± 1.7 35.3 ± 1.7…”
Section: Modelsmentioning
confidence: 99%
See 1 more Smart Citation