2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021
DOI: 10.1109/cvpr46437.2021.00928
|View full text |Cite
|
Sign up to set email alerts
|

Synthesizing Long-Term 3D Human Motion and Interaction in 3D Scenes

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
26
0
1

Year Published

2021
2021
2022
2022

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 76 publications
(27 citation statements)
references
References 53 publications
0
26
0
1
Order By: Relevance
“…Hassan et al [17] estimate a "goal" position and interaction direction on an object, plan a 3D path from a "start" body pose to this, and finally generate a sequence of body poses with an autoregressive cVAE for walking and interacting, e.g., sitting on a chair. Wang et al [58] first estimate several "sub-goal" positions and bodies, divide these into short start/end pairs to synthesize short-term motions, and finally stitch these together in a long motion with an optimization process.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Hassan et al [17] estimate a "goal" position and interaction direction on an object, plan a 3D path from a "start" body pose to this, and finally generate a sequence of body poses with an autoregressive cVAE for walking and interacting, e.g., sitting on a chair. Wang et al [58] first estimate several "sub-goal" positions and bodies, divide these into short start/end pairs to synthesize short-term motions, and finally stitch these together in a long motion with an optimization process.…”
Section: Related Workmentioning
confidence: 99%
“…Specifically, we optimize over SMPL-X pose, θ, and translation, t, initialized with GNet's predictions. Instead of hand-crafted contact constraints [8,19,58] during optimization, we use data-driven constraints generated from GNet. Specifically, we use: (1) hand-to-object vertex offsets, (2) head-orientation and (3) pose coupling to the initial value, and (4) foot-ground penetration.…”
Section: "Goal" Network (Gnet)mentioning
confidence: 99%
“…A significant body of work learns scene affordances, such as where a person can stand or sit, from observing data of humans [17,18,25,27,32,33,42,54,77]. Overlapping areas of work focus on human interactions with objects [12,29,52,80,85] or synthesize human motion conditioned on an input scene [10,53,74]. We propose the reverse task of hallucinating a scene conditioned on pose.…”
Section: Related Workmentioning
confidence: 99%
“…Early methods retrieve and integrate existing avatar motions from database to make them compatible with scene geometry [4,33,38,39,61]. Given a goal pose or a task, more recent works leverage deep learning methods to search for a possible motion path and estimate plausible contact motions [6,8,24,42,62,64]. These methods all explore humanscene interaction understanding by estimating object functionalities or human interactions as poses in a given 3D scene environment.…”
Section: Related Workmentioning
confidence: 99%