Synthesizing Long-Term 3D Human Motion and Interaction in 3D Scenes

Wang, Jiashun; Xu, Huazhe; Xu, Jingwei; Liu, Sifei; Wang, Xiaolong

doi:10.1109/cvpr46437.2021.00928

Cited by 76 publications

(27 citation statements)

References 53 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Hassan et al [17] estimate a "goal" position and interaction direction on an object, plan a 3D path from a "start" body pose to this, and finally generate a sequence of body poses with an autoregressive cVAE for walking and interacting, e.g., sitting on a chair. Wang et al [58] first estimate several "sub-goal" positions and bodies, divide these into short start/end pairs to synthesize short-term motions, and finally stitch these together in a long motion with an optimization process.…”

Section: Related Workmentioning

confidence: 99%

“…Specifically, we optimize over SMPL-X pose, θ, and translation, t, initialized with GNet's predictions. Instead of hand-crafted contact constraints [8,19,58] during optimization, we use data-driven constraints generated from GNet. Specifically, we use: (1) hand-to-object vertex offsets, (2) head-orientation and (3) pose coupling to the initial value, and (4) foot-ground penetration.…”

Section: "Goal" Network (Gnet)mentioning

confidence: 99%

See 1 more Smart Citation

GOAL: Generating 4D Whole-Body Motion for Hand-Object Grasping

Taheri¹,

Choutas²,

Black³

et al. 2021

Preprint

View full text Add to dashboard Cite

Section: Related Workmentioning

confidence: 99%

Section: "Goal" Network (Gnet)mentioning

confidence: 99%

GOAL: Generating 4D Whole-Body Motion for Hand-Object Grasping

Taheri¹,

Choutas²,

Black³

et al. 2021

Preprint

View full text Add to dashboard Cite

“…A significant body of work learns scene affordances, such as where a person can stand or sit, from observing data of humans [17,18,25,27,32,33,42,54,77]. Overlapping areas of work focus on human interactions with objects [12,29,52,80,85] or synthesize human motion conditioned on an input scene [10,53,74]. We propose the reverse task of hallucinating a scene conditioned on pose.…”

Section: Related Workmentioning

confidence: 99%

Hallucinating Pose-Compatible Scenes

Brooks¹,

Efros²

2021

Preprint

View full text Add to dashboard Cite

What does human pose tell us about a scene? We propose a task to answer this question: given human pose as input, hallucinate a compatible scene. Subtle cues captured by human pose -action semantics, environment affordances, object interactions -provide surprising insight into which scenes are compatible. We present a large-scale generative adversarial network for pose-conditioned scene generation. We significantly scale the size and complexity of training data, curating a massive meta-dataset containing over 19 million frames of humans in everyday environments. We double the capacity of our model with respect to StyleGAN2 to handle such complex data, and design a pose conditioning mechanism that drives our model to learn the nuanced relationship between pose and scene. We leverage our trained model for various applications: hallucinating pose-compatible scene(s) with or without humans, visualizing incompatible scenes and poses, placing a person from one generated image into another scene, and animating pose. Our model produces diverse samples and outperforms pose-conditioned StyleGAN2 and Pix2Pix baselines in terms of accurate human placement (percent of correct keypoints) and image quality (Fréchet inception distance).

show abstract

“…Early methods retrieve and integrate existing avatar motions from database to make them compatible with scene geometry [4,33,38,39,61]. Given a goal pose or a task, more recent works leverage deep learning methods to search for a possible motion path and estimate plausible contact motions [6,8,24,42,62,64]. These methods all explore humanscene interaction understanding by estimating object functionalities or human interactions as poses in a given 3D scene environment.…”

Section: Related Workmentioning

confidence: 99%

Pose2Room: Understanding 3D Scenes from Human Activities

Nie¹,

Dai²,

Han³

et al. 2021

Preprint

View full text Add to dashboard Cite

2 SRIBD, CUHKSZ Figure 1. From an observed pose trajectory of a person performing daily activities in an indoor scene (left), we learn to estimate likely object configurations of the scene underlying these interactions, as set of object class labels and oriented 3D bounding boxes (middle). By sampling from our probabilistic decoder, we synthesize multiple plausible object arrangements (right). (Scene geometry is shown only for visualization.)

show abstract

Synthesizing Long-Term 3D Human Motion and Interaction in 3D Scenes

Cited by 76 publications

References 53 publications

GOAL: Generating 4D Whole-Body Motion for Hand-Object Grasping

GOAL: Generating 4D Whole-Body Motion for Hand-Object Grasping

Hallucinating Pose-Compatible Scenes

Pose2Room: Understanding 3D Scenes from Human Activities

Contact Info

Product

Resources

About