“…The uncertainty is usually encoded as a sequence of latent variables, which are then used in a generative model such as GAN [12] based [27,34,5,31], or, similar to ours, VAE [20] based [35,6]. These methods [11,6,41] often leverage an input sequence instead of a single frame, which helps reduce the ambiguities. Further, the latent variables are either per-timestep [6], or global [1,41] whereas our model leverages a global latent variable, which in turn induces per-timestep variables.…”