2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2021
DOI: 10.1109/iccv48922.2021.01315
|View full text |Cite
|
Sign up to set email alerts
|

AI Choreographer: Music Conditioned 3D Dance Generation with AIST++

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
270
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
4
1

Relationship

0
9

Authors

Journals

citations
Cited by 279 publications
(270 citation statements)
references
References 61 publications
0
270
0
Order By: Relevance
“…( 4) FID, which is an extension of the original Frechet Inception Distance that calculates the distribution distance between estimated motions and the GT. FID is a standard metric in motion generation literature to evaluate the quality of generated motions [30,54,55,91]. Following prior work [55], we compute FID using the well-designed kinetic motion feature extractor in the fairmotion library [20].…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…( 4) FID, which is an extension of the original Frechet Inception Distance that calculates the distribution distance between estimated motions and the GT. FID is a standard metric in motion generation literature to evaluate the quality of generated motions [30,54,55,91]. Following prior work [55], we compute FID using the well-designed kinetic motion feature extractor in the fairmotion library [20].…”
Section: Methodsmentioning
confidence: 99%
“…FID is a standard metric in motion generation literature to evaluate the quality of generated motions [30,54,55,91]. Following prior work [55], we compute FID using the well-designed kinetic motion feature extractor in the fairmotion library [20].…”
Section: Methodsmentioning
confidence: 99%
“…The self attention mechanism of transformers provides a natural bridge to connect multimodal signals. Applications include audio enhancement [17,63], speech recognition [26], image segmentation [63,73], cross-modal sequence generation [21,37,38], video retrieval [20] and image/video captioning/classification [28,29,36,44,60,61]. A common paradigm (which we also adapt) is to use the output representations of single modality convolutional networks as inputs to the transformer [20,35].…”
Section: Related Workmentioning
confidence: 99%
“…by conditioning network weights on phase, but they focus on cyclic motions. More recent methods [34,39,47,56] use attention [57].…”
Section: Related Workmentioning
confidence: 99%