2017
DOI: 10.48550/arxiv.1705.02082
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Motion Prediction Under Multimodality with Conditional Stochastic Networks

Abstract: Given a visual history, multiple future outcomes for a video scene are equally probable, in other words, the distribution of future outcomes has multiple modes. Multimodality is notoriously hard to handle by standard regressors or classifiers: the former regress to the mean and the latter discretize a continuous high dimensional output space. In this work, we present stochastic neural network architectures that handle such multimodality through stochasticity: future trajectories of objects, body joints or fram… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
10
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
5
1

Relationship

1
5

Authors

Journals

citations
Cited by 7 publications
(12 citation statements)
references
References 23 publications
0
10
0
Order By: Relevance
“…The uncertainty is usually encoded as a sequence of latent variables, which are then used in a generative model such as GAN [12] based [27,34,5,31], or, similar to ours, VAE [20] based [35,6]. These methods [11,6,41] often leverage an input sequence instead of a single frame, which helps reduce the ambiguities. Further, the latent variables are either per-timestep [6], or global [1,41] whereas our model leverages a global latent variable, which in turn induces per-timestep variables.…”
Section: Related Workmentioning
confidence: 99%
“…The uncertainty is usually encoded as a sequence of latent variables, which are then used in a generative model such as GAN [12] based [27,34,5,31], or, similar to ours, VAE [20] based [35,6]. These methods [11,6,41] often leverage an input sequence instead of a single frame, which helps reduce the ambiguities. Further, the latent variables are either per-timestep [6], or global [1,41] whereas our model leverages a global latent variable, which in turn induces per-timestep variables.…”
Section: Related Workmentioning
confidence: 99%
“…Authors focused on evaluating the linearization properties, yet the model was not contrasted to previous works. Extending [92], [186], Fragkiadaki et al [177] proposed several architectural changes and training schemes to handle marginalization over stochastic variables, such as sampling from the prior and variational inference. Their stochastic ED architecture predicts future optical flow, i.e., dense pixel motion field, used to spatially transform the current frame into the next frame prediction.…”
Section: Incorporating Uncertaintymentioning
confidence: 99%
“…Using future frames as ground-truth leads to conditioned supervised learning approach which gives better results in contrast to unconditional video generation [8,18,28,39]. GAN based approaches often relies on a sequence of input frames as priors to reduce ambiguity [15,19,62,71]. Our approach uses only the first input frame and action class name as prior for the prediction task similar to [28,60].…”
Section: Related Workmentioning
confidence: 99%