Hierarchical Patch VAE-GAN: Generating Diverse Videos from a Single Sample

Gur, Shir; Benaim, Sagie; Wolf, Lior

doi:10.48550/arxiv.2006.12226

Cited by 7 publications

(24 citation statements)

References 28 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To evaluate the realism of generated frames, we use the FID metric [13] over each one. For temporal consistency, we adopt the recently proposed SVFID score introduced by Gur et al [12]. SVFID is an extension of FID for a single video, evaluating how the generated samples capture the temporal statistics of a single video, by using features from a pretrained action recognition network.…”

Section: Resultsmentioning

confidence: 99%

JOKR: Joint Keypoint Representation for Unsupervised Cross-Domain Motion Retargeting

Mokady,

Tzaban,

Benaim

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

The task of unsupervised motion retargeting in videos has seen substantial advancements through the use of deep neural networks. While early works concentrated on specific object priors such as a human face or body, recent work considered the unsupervised case. When the source and target videos, however, are of different shapes, current methods fail. To alleviate this problem, we introduce JOKRa JOint Keypoint Representation that captures the motion common to both the source and target videos, without requiring any object prior or data collection. By employing a domain confusion term, we enforce the unsupervised keypoint representations of both videos to be indistinguishable. This encourages disentanglement between the parts of the motion that are common to the two domains, and their distinctive appearance and motion, enabling the generation of videos that capture the motion of the one while depicting the style of the other. To enable cases where the objects are of different proportions or orientations, we apply a learned affine transformation between the JOKRs. This augments the representation to be affine invariant, and in practice broadens the variety of possible retargeting pairs. This geometry-driven representation enables further intuitive control, such as temporal coherence and manual editing. Through comprehensive experimentation, we demonstrate the applicability of our method to different challenging cross-domain video pairs. We evaluate our method both qualitatively and quantitatively, and demonstrate that our method handles various cross-domain scenarios, such as different animals, different flowers, and humans. We also demonstrate superior temporal coherency and visual quality compared to state-of-the-art alternatives, through statistical metrics and a user study. Source code and videos can be found at: https://rmokady.github.io/JOKR/.Preprint. Under review.

show abstract

Section: Resultsmentioning

confidence: 99%

JOKR: Joint Keypoint Representation for Unsupervised Cross-Domain Motion Retargeting

Mokady,

Tzaban,

Benaim

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…Throughout the experiments, we consider the following set of baselines: SinGAN [31], ConSin-GAN [13] and HP-VAE-GAN [8]. To evaluate image generation, we use the single-image FID metric [31].…”

Section: Methodsmentioning

confidence: 99%

“…Previous methods [31,13,8] freeze each intermediate generator g i except for the current training scale, ensuring each g i to be independent 1 . In our case, we freeze the projection of all previous scales, except the current scale.…”

Section: Reconstruction Loss Functionmentioning

confidence: 99%

“…In the field of internal learning, one wishes to learn the internal statistics of a signal in order to perform various downstream tasks. In this work, we focus on Single image GANs [31,32,13,8], which present extremely impressive results in modeling the distribution of images that are similar to the input image, and in applying this distribution to a variety of applications. However, given that there is no shortage of unlabeled images, one may ask whether a better approach would be to model multiple images and only then condition the model on a single input image.…”

Section: Introductionmentioning

confidence: 99%

“…In this paper, we consider the meta-learning problem of learning to generate a variety of samples from a single image, where each individual learning problem is defined by this single image input. For this purpose, we first recall the setting of single-image generation as in [31,13,8].…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Meta Internal Learning

Bensadoun¹,

Gur²,

Galanti³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

Internal learning for single-image generation is a framework, where a generator is trained to produce novel images based on a single image. Since these models are trained on a single image, they are limited in their scale and application. To overcome these issues, we propose a meta-learning approach that enables training over a collection of images, in order to model the internal statistics of the sample image more effectively. In the presented meta-learning approach, a single-image GAN model is generated given an input image, via a convolutional feedforward hypernetwork f . This network is trained over a dataset of images, allowing for feature sharing among different models, and for interpolation in the space of generative models. The generated single-image model contains a hierarchy of multiple generators and discriminators. It is therefore required to train the meta-learner in an adversarial manner, which requires careful design choices that we justify by a theoretical analysis. Our results show that the models obtained are as suitable as single-image GANs for many common image applications, significantly reduce the training time per image without loss in performance, and introduce novel capabilities, such as interpolation and feedforward modeling of novel images. Our code is available at: https://github.com/RaphaelBensTAU/MetaInternalLearning.Preprint. Under review.

show abstract

Drop the GAN: In Defense of Patches Nearest Neighbors as Single Image Generative Models

Granot¹,

Feinstein²,

Shocher³

et al. 2021

Preprint

View full text Add to dashboard Cite

Figure 1: Our simple unified framework covers a broad spectrum of single-image generative tasks, that usually require hours of training per image for GANs. Using patch nearest neighbors and a single source image, we can perform these tasks in a few seconds and with higher quality. We show here results obtained with our method for pivotal examples shown in SinGAN [23], InGAN [24], Structural analogies [3] and Bidirectional Similarity [26]. Additionally, we introduce novel applications such as Conditional-Inpainting. Input images marked in red.

show abstract

Hierarchical Patch VAE-GAN: Generating Diverse Videos from a Single Sample

Cited by 7 publications

References 28 publications

JOKR: Joint Keypoint Representation for Unsupervised Cross-Domain Motion Retargeting

JOKR: Joint Keypoint Representation for Unsupervised Cross-Domain Motion Retargeting

Meta Internal Learning

Drop the GAN: In Defense of Patches Nearest Neighbors as Single Image Generative Models

Contact Info

Product

Resources

About