2017
DOI: 10.48550/arxiv.1711.09561
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

HP-GAN: Probabilistic 3D human motion prediction via GAN

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
16
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 7 publications
(16 citation statements)
references
References 0 publications
0
16
0
Order By: Relevance
“…Human dynamics can be best modeled via global motion and detailed local joint locations (referred to as human skeleton or pose) [29]- [33]. Human dynamics are previously modelled in images [34] by two major types of methods in videos including state transition models (such as graphical models) [35], [36] or more recently sequence-to-sequence deep learning methods [1], [2], [7], [32], [37]- [39]. Chao et al [34] introduced a method for pose forecasting in 3D on static images, Barsoum et al [39] used Wasserstein GAN [40] in a probabilistic setting, Walker et al [38] used variational autoencoders, Fragkiadaki et al [41] proposed architectures based on LSTM and Encoder-Recurrent-Decoder methods, Yan et al [42] and Zhao et al [43] proposed methods that could predict longer into the future, Martinez et al [1] introduced a designed RNN for human pose prediction, Chiu et al [32] utilized a multi-layer hierarchical recurrent architecture, and Wang et al proposed to use imitation learning and specifically Generative Adversarial Imitation Learning (GAIL) [44] to capture human dynamics.…”
Section: B Motion and Pose Forecastingmentioning
confidence: 99%
“…Human dynamics can be best modeled via global motion and detailed local joint locations (referred to as human skeleton or pose) [29]- [33]. Human dynamics are previously modelled in images [34] by two major types of methods in videos including state transition models (such as graphical models) [35], [36] or more recently sequence-to-sequence deep learning methods [1], [2], [7], [32], [37]- [39]. Chao et al [34] introduced a method for pose forecasting in 3D on static images, Barsoum et al [39] used Wasserstein GAN [40] in a probabilistic setting, Walker et al [38] used variational autoencoders, Fragkiadaki et al [41] proposed architectures based on LSTM and Encoder-Recurrent-Decoder methods, Yan et al [42] and Zhao et al [43] proposed methods that could predict longer into the future, Martinez et al [1] introduced a designed RNN for human pose prediction, Chiu et al [32] utilized a multi-layer hierarchical recurrent architecture, and Wang et al proposed to use imitation learning and specifically Generative Adversarial Imitation Learning (GAIL) [44] to capture human dynamics.…”
Section: B Motion and Pose Forecastingmentioning
confidence: 99%
“…l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " g s j x T w 7 a j poses in videos [5-7, 19, 26, 47]. Chao et al [6] proposed a 3D Pose Forecasting Network on static images; Barsoum et al [5] took a probabilistic approach for pose prediction using Wasserstein GAN [2]; Walker et al [47] proposed a variational autoencoder solution; Fragkiadaki et al [9] proposed two architectures denoted by LSTM-3LR (3 layers of LSTM cells) and ERD (Encoder-Recurrent-Decoder); Yan et al [51] and Zhao et al [53] proposed methods for longer time prediction; Martinez et al [26] used a carefully tailored RNN to learn human motion prediction; and Chiu et al [7] proposed a multi-layer hierarchical RNN architecture (denoted by TP-RNN) to capture human dynamics. In contrast, instead of training a fully supervised model, in this work we introduce the unsupervised GAIL framework into our training process to enhance the generalizability of our learned prediction policy.…”
Section: Related Workmentioning
confidence: 99%
“…As such, to create machines that can interact with humans seamlessly, it is very important to convey the ability of predicting short-and long-term future of human dynamics based on the immediate present and past. Recently, computer vision researchers attempted predicting human dynamics from images [11], or through time in videos [8,31,37]. Human dynamics are mainly defined as a set of structured body joints, known as poses [33].…”
Section: Introductionmentioning
confidence: 99%
“…Ghosh et al [16], with reference to [23], attributed this finding to the side-effects of curriculum learning (such as in [11]), commonly practiced for temporal forecasting. With such observations, some previous works focused on short-term forecasting of human poses [18,23], and some others exclusively aimed attention at long-term predictions [8,16,36]. However, most of the previous methods achieve reasonable performance by incorporating action labels as extra data annotation in their models, i.e., they either trained pose fore-casters on each action class separately (e.g., [15,18]) or incorporated the action labels as an extra input to the model and concluded that including action labels improves the results (e.g., [23]).…”
Section: Introductionmentioning
confidence: 99%