No Gestures Left Behind: Learning Relationships between Spoken Language and Freeform Gestures

Ahuja, Chaitanya; Lee, Jun Young; Ishii, Ryo; Morency, Louis–Philippe

doi:10.18653/v1/2020.findings-emnlp.170

Cited by 40 publications

(25 citation statements)

References 47 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…They may also scale better to large datasets. However, despite several attempts [1,26,53], there have in our view been no convincing demonstrations of recent data-driven approaches consistently generating gestures with a clear semantic relation to the speech content. For example, in terms of subjective gesture appropriateness for the speech, no system in the 2020 GENEA gesture-generation challenge [27] surpassed a bottom line that simply paired the input speech audio with mismatched excerpts of training data motion, completely unrelated to the speech.…”

Section: Introductionmentioning

confidence: 84%

See 1 more Smart Citation

Multimodal analysis of the predictability of hand-gesture properties

Kucherenko¹,

Nagy²,

Neff³

et al. 2021

Preprint

View full text Add to dashboard Cite

Embodied conversational agents benefit from being able to accompany their speech with gestures. Although many data-driven approaches to gesture generation have been proposed in recent years, it is still unclear whether such systems can consistently generate gestures that convey meaning. We investigate which gesture properties (phase, category, and semantics) can be predicted from speech text and/or audio using contemporary deep learning. In extensive experiments, we show that gesture properties related to gesture meaning (semantics and category) are predictable from text features (time-aligned BERT embeddings) alone, but not from prosodic audio features, while rhythm-related gesture properties (phase) on the other hand can be predicted from either audio, text (with word-level timing information), or both. These results are encouraging as they indicate that it is possible to equip an embodied agent with content-wise meaningful co-speech gestures using a machine-learning model.

show abstract

Section: Introductionmentioning

confidence: 84%

“…While early hand gesture-generation systems mainly relied on rule-based approaches [6,24,37,40], data-driven gesture generation has become an important research area in recent years [1,13,26,53,54]. Both paradigms have advantages and disadvantages.…”

Section: Introductionmentioning

confidence: 99%

Multimodal analysis of the predictability of hand-gesture properties

Kucherenko¹,

Nagy²,

Neff³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…When we interact with embodied conversational agents, we expect a similar manner of nonverbal communication as when interacting with humans. One way to achieve more human-like nonverbal behaviour in conversational agents is through the use of data-driven methods, which learn model parameters from data and gained in popularity over the past few years [1,17,18,40]. Data-driven methods have been used to generate lips synchronisation, eye gaze or facial expressions, however in this work we take co-speech gestures as a test bed for comparing evaluation methods.…”

Section: Introductionmentioning

confidence: 99%

“…Objective measures rely on an algorithmic approach to return a quantitative measure of the quality of the behaviour and are entirely automated, while subjective measures instead rely on ratings by human observers. Most recent papers on co-speech gesture generation report objective measures to assess the quality of the generated behaviour, with measures such as velocity diagrams or average jerk being popular [1,16,40]. These measures not only are easy to automate, but also allow comparisons across models.…”

Section: Introductionmentioning

confidence: 99%

“…We take generated co-speech gestures as a test bed for our evaluations but note that these findings may also apply to stimulus evaluation in other areas. Our hypotheses, design, and methodology were pre-registered before the data was gathered 1 . We present short video clips to human participants, with each video clip showing an avatar displaying combined verbal and nonverbal behaviour.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

To Rate or Not To Rate: Investigating Evaluation Methods for Generated Co-Speech Gestures

Wolfert

Girard

Kucherenko

et al. 2021

Proceedings of the 2021 International Conference on Multimodal Interaction

View full text Add to dashboard Cite

While automatic performance metrics are crucial for machine learning of artificial human-like behaviour, the gold standard for evaluation remains human judgement. The subjective evaluation of artificial human-like behaviour in embodied conversational agents is however expensive and little is known about the quality of the data it returns. Two approaches to subjective evaluation can be largely distinguished, one relying on ratings, the other on pairwise comparisons. In this study we use co-speech gestures to compare the two against each other and answer questions about their appropriateness for evaluation of artificial behaviour. We consider their ability to rate quality, but also aspects pertaining to the effort of use and the time required to collect subjective data. We use crowd sourcing to rate the quality of co-speech gestures in avatars, assessing which method picks up more detail in subjective assessments. We compared gestures generated by three different machine learning models with various level of behavioural quality. We found that both approaches were able to rank the videos according to quality and that the ranking significantly correlated, showing that in terms of quality there is no preference of one method over the other. We also found that pairwise comparisons were slightly faster and came with improved inter-rater reliability, suggesting that for small-scale studies pairwise comparisons are to be favoured over ratings. CCS CONCEPTS• Human-centered computing → HCI design and evaluation methods; Human computer interaction (HCI).

show abstract