2017
DOI: 10.48550/arxiv.1704.05831
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Learning to Generate Long-term Future via Hierarchical Prediction

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
37
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
7
2
1

Relationship

0
10

Authors

Journals

citations
Cited by 20 publications
(37 citation statements)
references
References 0 publications
0
37
0
Order By: Relevance
“…We leverage the 3D Dynamic Prediction (3DDP) network to predict 3DMM dynamics, which can be used for 3D face reconstruction and texture mapping to provide valuable information for face generation. Different with the method [52] that only learns to predict a reasonable future from the initial state, we focus on three tasks, i.e., face video retargeting, video prediction the same as [52] and target-driven video prediction that "imagines" the temporal changes via completing missing frames between the given source and target face images. For the retargeting task, we use a reference video to offer the sequential changes, rather than using 3DDP to predict the future changes.…”
Section: B 3d Dynamic Predictionmentioning
confidence: 99%
“…We leverage the 3D Dynamic Prediction (3DDP) network to predict 3DMM dynamics, which can be used for 3D face reconstruction and texture mapping to provide valuable information for face generation. Different with the method [52] that only learns to predict a reasonable future from the initial state, we focus on three tasks, i.e., face video retargeting, video prediction the same as [52] and target-driven video prediction that "imagines" the temporal changes via completing missing frames between the given source and target face images. For the retargeting task, we use a reference video to offer the sequential changes, rather than using 3DDP to predict the future changes.…”
Section: B 3d Dynamic Predictionmentioning
confidence: 99%
“…Subsequent work that explicitly considered such basic image structure yielded improved results [96,104]; however, prediction horizons were little extended. To support longer term predictions, work has relied on higher-level image analysis (e.g., body keypoint and motion analysis [114,158,159]). Other work has made use of stochastic generative models to capture uncertainty [5,84,157].…”
Section: Overview Of Video Predictive Understandingmentioning
confidence: 99%
“…The above-mentioned approaches all modeled each human as one rigid object, ignoring the local movements of human body joints. To pay attention on both moving speed and pose variances of people, the approaches [21], [22] took human skeletons as the input to RNN and modeled human movements in this way. Recently, paper [23] proposed to divide human skeletons into five parts and feed them into five independent RNNs for feature extraction.…”
Section: B Human Action Recognition In Videosmentioning
confidence: 99%