“…Such methods (e.g., [1,47]) can typically produce high quality results only for a limited class of videos such as face expressions (e.g., [41]) or robot hand movement (e.g., [10]), and cannot be applied to arbitrary videos. On the other side of the spectrum, single video generative models have recently emerged [3,17]. These methods learn a generative video model from a single input video, in an unsupervised manner.…”