2021
DOI: 10.1145/3450626.3459850
|View full text |Cite
|
Sign up to set email alerts
|

Driving-signal aware full-body avatars

Abstract: We present a learning-based method for building driving-signal aware full-body avatars. Our model is a conditional variational autoencoder that can be animated with incomplete driving signals, such as human pose and facial keypoints, and produces a high-quality representation of human geometry and view-dependent appearance. The core intuition behind our method is that better drivability and generalization can be achieved by disentangling the driving signals and remaining generative factors, which are not avail… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
10
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
4
2
2

Relationship

0
8

Authors

Journals

citations
Cited by 47 publications
(10 citation statements)
references
References 69 publications
0
10
0
Order By: Relevance
“…As demonstrated in [47], the articulated MVP improves fidelity over mesh-based representations due to its volumetric nature while being computationally efficient for real-time rendering. Additionally, it only requires a coarse mesh from an LBS model as guidance, in contrast to prior mesh-based works [2,29], which rely on precise surface tracking.…”
Section: Preliminariesmentioning
confidence: 99%
See 1 more Smart Citation
“…As demonstrated in [47], the articulated MVP improves fidelity over mesh-based representations due to its volumetric nature while being computationally efficient for real-time rendering. Additionally, it only requires a coarse mesh from an LBS model as guidance, in contrast to prior mesh-based works [2,29], which rely on precise surface tracking.…”
Section: Preliminariesmentioning
confidence: 99%
“…Additionally, joint features are input at the lowest resolution level of the U-Net layer such that the resulting appearance explicitly accounts for pose-dependent texture changes, such as small wrinkles, that may not be represented by the primitive geometry. We use a spatially aligned joint feature encoder J t (θ)∈R 64×64×64 in UV space as in [2]. Our loss is expressed by…”
Section: R G I W T B T U a F J + E H W Y I 2 6 O H Hmentioning
confidence: 99%
“…Modeling geometry of dynamic non-rigid scenes is considered in a number of recent approaches either for capturing human actors [5,54,29,27,30,35,16,20,17,59] or general scenes [32,45,41]. Towards this end, 3D scans are required for supervision in [20,17,35,54,29] to learn rigged human geometry.…”
Section: Related Workmentioning
confidence: 99%
“…Towards this end, 3D scans are required for supervision in [20,17,35,54,29] to learn rigged human geometry. Likewise, [32,30,16,5,48,27] utilize multi-view data to capture appearance and produce photo-realistic avatars under arbitrary viewpoints and in arbitrary poses. Several methods [59,45,41] use monocular videos but allow free-viewpoint rendering only.…”
Section: Related Workmentioning
confidence: 99%
“…A learning-based method for building driving-signal aware full-body avatars was presented in [27]. The model was a conditional variational auto-encoder that could be animated with incomplete driving signals, such as human pose and facial keypoints, and produced a high-quality representation of human geometry and view-dependent appearance.…”
Section: Introductionmentioning
confidence: 99%