Aligning Latent and Image Spaces to Connect the Unconnectable

Skorokhodov, Ivan; Sotnikov, Grigorii; Elhoseiny, Mohamed

doi:10.48550/arxiv.2104.06954

Cited by 6 publications

(14 citation statements)

References 53 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Further, observe that the face identity is well-preserved for unrelated edits and that local edits, such as those changing hairstyle and expression, do not alter unrelated image regions (e.g., expression is consistent across the "gender", "hi-top fade", and "tanned" edits). Notably, this disentanglement holds for other domains such as animal faces (AFHQv2 [13]) and landscapes (Landscapes HQ [63]). When editing animals, fur color, pose, and backgrounds are well-preserved under the various edits.…”

Section: Editing Via Non-linear Latent Pathsmentioning

confidence: 97%

See 1 more Smart Citation

Third Time's the Charm? Image and Video Editing with StyleGAN3

Alaluf¹,

Patashnik²,

Wu³

et al. 2022

Preprint

View full text Add to dashboard Cite

StyleGAN is arguably one of the most intriguing and well-studied generative models, demonstrating impressive performance in image generation, inversion, and manipulation. In this work, we explore the recent StyleGAN3 architecture, compare it to its predecessor, and investigate its unique advantages, as well as drawbacks. In particular, we demonstrate that while StyleGAN3 can be trained on unaligned data, one can still use aligned data for training, without hindering the ability to generate unaligned imagery. Next, our analysis of the disentanglement of the different latent spaces of StyleGAN3 indicates that the commonly used W/W+ spaces are more entangled than their StyleGAN2 counterparts, underscoring the benefits of using the StyleSpace for fine-grained editing. Considering image inversion, we observe that existing encoder-based techniques struggle when trained on unaligned data. We therefore propose an encoding scheme trained solely on aligned data, yet can still invert unaligned images. Finally, we introduce a novel video inversion and editing workflow that leverages the capabilities of a fine-tuned Style-GAN3 generator to reduce texture sticking and expand the field of view of the edited video. Code is available on our project page: https://yuval-alaluf.github.io/ stylegan3-editing/.* Denotes equal contribution.

show abstract

Section: Editing Via Non-linear Latent Pathsmentioning

confidence: 97%

“…7. Editing in S. We edit synthetic images using the Style-CLIP [48] global directions technique using StyleGAN3 generators trained on the FFHQ [35], AFHQv2 [13,34], and Landscapes HQ [63] datasets.…”

Section: Editing Via Non-linear Latent Pathsmentioning

confidence: 99%

Third Time's the Charm? Image and Video Editing with StyleGAN3

Alaluf¹,

Patashnik²,

Wu³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…It is mostly popular for 3D reconstruction and geometry processing tasks (e.g., [35,39,43,45,50]), including video-based reconstruc- tion [33,46,51,79]. Several recent projects explored the task of building generative models over such representations to synthesize images (e.g., [4,62,63]), 3D objects (e.g., [11,32,58]) or multi-modal signals (e.g., [15,16]), and our work extends this line of research to video generation.…”

Section: Related Workmentioning

confidence: 99%

“…It is noticeable on datasets where new content appears during a video, like Sky Timelapse or Rainbow Jelly. We believe it can be resolved using ideas similar to ALIS [63].…”

Section: A Limitations and Potential Negative Impact A1 Limitationsmentioning

confidence: 99%

“…An important design decision is the scaling of periods since at initialization it should cover both high-frequency and low-frequency details. Existing works use either exponential scaling σ = (2π/2 d , 2π/2 d−1 , ...) (e.g., [22,40,46,63]) or random scaling σ ∼ N (0, ξI) (e.g., [4,60,62,66]). In practice, we scale the i-th column of the amplitudes weight matrix with the value:…”

Section: B3 Additional Details On Positional Embeddingsmentioning

confidence: 99%

See 1 more Smart Citation

StyleGAN-V: A Continuous Video Generator with the Price, Image Quality and Perks of StyleGAN2

Skorokhodov¹,

Tulyakov²,

Elhoseiny³

2021

Preprint

Self Cite

View full text Add to dashboard Cite

Videos show continuous events, yet most -if not allvideo synthesis frameworks treat them discretely in time.In this work, we think of videos of what they should betime-continuous signals, and extend the paradigm of neural representations to build a continuous-time video generator. For this, we first design continuous motion representations through the lens of positional embeddings. Then, we explore the question of training on very sparse videos and demonstrate that a good generator can be learned by using as few as 2 frames per clip. After that, we rethink the traditional image and video discriminators pair and propose to use a single hypernetwork-based one. This decreases the training cost and provides richer learning signal to the generator, making it possible to train directly on 1024 2 videos for the first time. We build our model on top of StyleGAN2 and it is just ≈5% more expensive to train at the same resolution while achieving almost the same image quality. Moreover, our latent space features similar properties, enabling spatial manipulations that our method can propagate in time. We can generate arbitrarily long videos at arbitrary high frame rate, while prior work struggles to generate even 64 frames at a fixed rate. Our model achieves state-of-the-art results on four modern 256 2 video synthesis benchmarks and one 1024 2 resolution one. 1

show abstract

Republic of Korea (South Korea)

Kim¹

The Crime of Aggression

View full text Add to dashboard Cite

Despite recent progress in semantic image synthesis, complete control over image style remains a challenging problem. Existing methods require reference images to feed style information into semantic layouts, which indicates that the style is constrained by the given image. In this paper, we propose a model named RUCGAN for user controllable semantic image synthesis, which utilizes a singular color to represent the style of a specific semantic region. The proposed network achieves reference-free semantic image synthesis by injecting color as userdesired styles into each semantic layout, and is able to synthesize semantic images with unusual colors. Extensive experimental results on various challenging datasets show that the proposed method outperforms existing methods, and we further provide an interactive UI to demonstrate the advantage of our approach for style controllability. The codes and UI are available at: https://github.com/BenjaminJonghyun/RUCGAN

show abstract

Aligning Latent and Image Spaces to Connect the Unconnectable

Abstract: Figure 1: Our method can generate infinite images of diverse and complex scenes that transition naturally from one into another. It does so without any conditioning and trains without any supervision from a dataset of unrelated square images.

Cited by 6 publications

References 53 publications

Third Time's the Charm? Image and Video Editing with StyleGAN3

Third Time's the Charm? Image and Video Editing with StyleGAN3

StyleGAN-V: A Continuous Video Generator with the Price, Image Quality and Perks of StyleGAN2

Republic of Korea (South Korea)

Contact Info

Product

Resources

About