Embedding Arithmetic for Text-driven Image Transformation

Couairon, Guillaume; Cord, Matthieu; Douze, Matthijs; Schwenk, Holger

doi:10.48550/arxiv.2112.03162

Cited by 1 publication

(1 citation statement)

References 22 publications

(36 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…By training with large-scale data, CLIP performs with high accuracy on tasks such as zero-shot text-to-image retrieval without additional training. The SIMAT Dataset [29] uses the latent space of CLIP to perform semantic operations on images and text.…”

Section: Multi-modal Representations By Contrastive Learningmentioning

confidence: 99%

Façade Design Support System with Control of Image Generation using GAN

Haji

Yamaji²,

Takagi³

et al. 2023

IIAI Letters on Informatics and Interdisciplinary Research

View full text Add to dashboard Cite

Designing the façade, which is the front side of a building, is a crucial yet time consuming part of the architectural design process. Advancements in image generation have led to generative models capable of producing creative, high-quality images. However, it is difficult to apply existing image generative models to façade design as the generated images should provide the architect with inspiration while also reflecting the designer's knowledge and building requirements. The existing models are inadequate for controlling image generation. Thus, we propose a system that supports designers in coming up with ideas for façade design by enabling them to intervene in image generation. The proposed system first determines the base image by using text-to-image retrieval. Next, the system generates diverse images using adversarial generation networks and the user selects images in alternation. This allows for repeated divergence and convergence of ideas and provides the user with inspiration. Our experiments demonstrated that the proposed system is able to arrive at the target idea while providing a variety of ideas through controlled generation.

show abstract

Section: Multi-modal Representations By Contrastive Learningmentioning

confidence: 99%