2022
DOI: 10.1007/978-3-031-20071-7_36
|View full text |Cite
|
Sign up to set email alerts
|

BEAT: A Large-Scale Semantic and Emotional Multi-modal Dataset for Conversational Gestures Synthesis

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
35
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
2
2

Relationship

0
8

Authors

Journals

citations
Cited by 52 publications
(43 citation statements)
references
References 47 publications
0
35
0
Order By: Relevance
“…7.1.1 Data. We train and test our system on two high-quality speech-gesture datasets: ZeroEGGS [Ghorbani et al 2022] and BEAT [Liu et al 2022e]. The ZeroEGGS dataset contains two hours of full-body motion capture and audio from monologues performed by an English-speaking female actor in 19 different styles.…”
Section: System Setupmentioning
confidence: 99%
See 2 more Smart Citations
“…7.1.1 Data. We train and test our system on two high-quality speech-gesture datasets: ZeroEGGS [Ghorbani et al 2022] and BEAT [Liu et al 2022e]. The ZeroEGGS dataset contains two hours of full-body motion capture and audio from monologues performed by an English-speaking female actor in 19 different styles.…”
Section: System Setupmentioning
confidence: 99%
“…At the time of writing this work, the authors of CaMN [Liu et al 2022e] have not provided the pre-trained generation model. Instead, they offered training codes for a toy dataset and a pre-trained motion auto-encoder for the calculation of FGD.…”
Section: B Implementation Details Of Baselinesmentioning
confidence: 99%
See 1 more Smart Citation
“…Since the existing 3D hand prediction dataset is noisy due to the automated annotation process, we propose a prototypical memory bank to store the realistic hand prototype representations encoded from real 3D hands. These 3D hands are captured from a studio-based mocap-captured dataset, named BEAT [22]. The reading and updating strategies of hands prototypical memory are the same as TMM.…”
Section: Stage Two: Diverse Samplingmentioning
confidence: 99%
“…In recent years, the compelling performance of deep neural networks has prompted datadriven approaches. Previous studies establish large-scale speech-gesture corpus to learn the mapping from speech audio to human skeletons in an end-to-end manner [4,5,25,27,30,34,39]. To attain more expressive results, Ginosar et al [16] and Yoon et al [41] propose GAN-based methods to guarantee realism by adversarial mechanism, where the discriminator is trained to distinguish real gestures from the synthetic ones while the generator's objective is to fool the discriminator.…”
Section: Introductionmentioning
confidence: 99%