“…Facing this challenge, most of prior researches apply the generative model to represent multimodality by a latent variable. For instance, some methods [6,9,12,19,37,43,54] utilize generative adversarial networks (GANs) to spread the distribution over all possible future trajectories, while other methods [3,16,20,25,38,46] exploit conditional variational auto-encoder (CVAE) to encode the multi-modal distribution of future trajectories. Despite the remarkable progress, these methods still face inherent limitations, e.g., training process could be unstable for GANs due to adversarial learning, and CVAE tends to produce unnatural trajectories.…”