2021
DOI: 10.1080/10447318.2021.1883883
|View full text |Cite
|
Sign up to set email alerts
|

Moving Fast and Slow: Analysis of Representations and Post-Processing in Speech-Driven Automatic Gesture Generation

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
20
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
4
1

Relationship

3
6

Authors

Journals

citations
Cited by 32 publications
(21 citation statements)
references
References 40 publications
1
20
0
Order By: Relevance
“…Many data-driven systems have only considered a single speech modality -either audio recordings or text transcriptions thereofas input to the gesture generation, e.g., [3,25,39,54]. However, the field is now shifting to use both audio and text together [1,9,26,53].…”
Section: Effect Of the Speech Input Modalitymentioning
confidence: 99%
See 1 more Smart Citation
“…Many data-driven systems have only considered a single speech modality -either audio recordings or text transcriptions thereofas input to the gesture generation, e.g., [3,25,39,54]. However, the field is now shifting to use both audio and text together [1,9,26,53].…”
Section: Effect Of the Speech Input Modalitymentioning
confidence: 99%
“…These prosodic features are commonly used in speech emotion analysis as well as for gesture property prediction, e.g., [56]. We normalised pitch and intensity like in [8,25]: the pitch values were adjusted by taking 𝑙𝑜𝑔(𝑥 + 1) − 4 and setting negative values to zero, and the intensity values were adjusted by taking 𝑙𝑜𝑔(𝑥) − 3. The audio features were first extracted at 200 fps and then resampled to 5 fps by averaging, to match the resolution of the gesture annotations.…”
Section: Speech Modalities and Their Encodingmentioning
confidence: 99%
“…The transfer of physical human movement to virtual avatars is certainly not a novel concept. Broadly speaking, human movement behavior has long been of interest to scholars examining natural mapping (Birk and Mandryk, 2013;Vanden Abeele et al, 2013), intelligent virtual agents (Gratch et al, 2002;Thiebaux et al, 2008;Marsella et al, 2013;Kucherenko et al, 2021), VR-based gesture tracking (Won et al, 2012;Christou and Michael, 2014), and pose estimation of anatomical keypoints (Andriluka et al, 2010;Pishchulin et al, 2012;Cao et al, 2017). Natural mapping motion capture systems, which generate virtual avatar representations based on physical human behavior, vary from 3D pose estimation [See (Wang et al, 2021) for a review] to facial expression sensors (Lugrin et al, 2016).…”
Section: Person-based Actionsmentioning
confidence: 99%
“…Objective measures rely on an algorithmic approach to return a quantitative measure of the quality of the behaviour and are entirely automated, while subjective measures instead rely on ratings by human observers. Most recent papers on co-speech gesture generation report objective measures to assess the quality of the generated behaviour, with measures such as velocity diagrams or average jerk being popular [1,16,40]. These measures not only are easy to automate, but also allow comparisons across models.…”
Section: Introductionmentioning
confidence: 99%