Findings of the Association for Computational Linguistics: EMNLP 2020 2020
DOI: 10.18653/v1/2020.findings-emnlp.170
|View full text |Cite
|
Sign up to set email alerts
|

No Gestures Left Behind: Learning Relationships between Spoken Language and Freeform Gestures

Abstract: We study relationships between spoken language and co-speech gestures in context of two key challenges. First, distributions of text and gestures are inherently skewed making it important to model the long tail. Second, gesture predictions are made at a subword level, making it important to learn relationships between language and acoustic cues. We introduce Adversarial Importance Sampled Learning (or AISLe), which combines adversarial learning with importance sampling to strike a balance between precision and… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

0
25
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 40 publications
(25 citation statements)
references
References 47 publications
0
25
0
Order By: Relevance
“…They may also scale better to large datasets. However, despite several attempts [1,26,53], there have in our view been no convincing demonstrations of recent data-driven approaches consistently generating gestures with a clear semantic relation to the speech content. For example, in terms of subjective gesture appropriateness for the speech, no system in the 2020 GENEA gesture-generation challenge [27] surpassed a bottom line that simply paired the input speech audio with mismatched excerpts of training data motion, completely unrelated to the speech.…”
Section: Introductionmentioning
confidence: 84%
See 1 more Smart Citation
“…They may also scale better to large datasets. However, despite several attempts [1,26,53], there have in our view been no convincing demonstrations of recent data-driven approaches consistently generating gestures with a clear semantic relation to the speech content. For example, in terms of subjective gesture appropriateness for the speech, no system in the 2020 GENEA gesture-generation challenge [27] surpassed a bottom line that simply paired the input speech audio with mismatched excerpts of training data motion, completely unrelated to the speech.…”
Section: Introductionmentioning
confidence: 84%
“…While early hand gesture-generation systems mainly relied on rule-based approaches [6,24,37,40], data-driven gesture generation has become an important research area in recent years [1,13,26,53,54]. Both paradigms have advantages and disadvantages.…”
Section: Introductionmentioning
confidence: 99%
“…When we interact with embodied conversational agents, we expect a similar manner of nonverbal communication as when interacting with humans. One way to achieve more human-like nonverbal behaviour in conversational agents is through the use of data-driven methods, which learn model parameters from data and gained in popularity over the past few years [1,17,18,40]. Data-driven methods have been used to generate lips synchronisation, eye gaze or facial expressions, however in this work we take co-speech gestures as a test bed for comparing evaluation methods.…”
Section: Introductionmentioning
confidence: 99%
“…Objective measures rely on an algorithmic approach to return a quantitative measure of the quality of the behaviour and are entirely automated, while subjective measures instead rely on ratings by human observers. Most recent papers on co-speech gesture generation report objective measures to assess the quality of the generated behaviour, with measures such as velocity diagrams or average jerk being popular [1,16,40]. These measures not only are easy to automate, but also allow comparisons across models.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation