Attend-and-Excite: Attention-Based Semantic Guidance for Text-to-Image Diffusion Models

Chefer, Hila; Alaluf, Yuval; Vinker, Yael; Wolf, Lior; Cohen–Or, Daniel

doi:10.48550/arxiv.2301.13826

Cited by 6 publications

(12 citation statements)

References 34 publications

(56 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…As shown in Fig. A14, when Cones incorporates the Attend-and-Excite (Feng et al, 2022;Chefer et al, 2023) method to address this issue, it generates better results.…”

Section: C5 More Results On Multi Subjectsmentioning

confidence: 99%

Cones: Concept Neurons in Diffusion Models for Customized Generation

Liu¹,

Feng²,

Zhu³

et al. 2023

Preprint

View full text Add to dashboard Cite

show abstract

“…As shown in Fig. A14, when Cones incorporates the Attend-and-Excite (Feng et al, 2022;Chefer et al, 2023) method to address this issue, it generates better results.…”

Section: C5 More Results On Multi Subjectsmentioning

confidence: 99%

Cones: Concept Neurons in Diffusion Models for Customized Generation

Liu¹,

Feng²,

Zhu³

et al. 2023

Preprint

View full text Add to dashboard Cite

show abstract

“…The remarkable advances in this area are driven by the application of state-of-the-art image-generative models, such as auto-regressive (Ramesh et al 2021;Wang et al 2022) and diffusion models (Ramesh et al 2022;Saharia et al 2022;Rombach et al 2022), as well as the availability of large-scale language-image datasets (Sharma et al 2018;Schuhmann et al 2022). However, existing methods face challenges in synthesizing or editing multiple subjects with specific relational and attributive constraints from textual prompts (Chefer et al 2023). The typical defects that oc-cur in the synthesis results are missing entities, and inaccurate inter-object relations, as shown in ??.…”

Section: Introductionmentioning

confidence: 99%

“…The typical defects that oc-cur in the synthesis results are missing entities, and inaccurate inter-object relations, as shown in ??. Existing work improves the compositional skills of text-to-image synthesis models by incorporating linguistic structures (Feng et al 2022), and attention controls (Hertz et al 2022;Chefer et al 2023) within the diffusion guidance process. Notably, Structured Diffusion (Feng et al 2022) parse a text to extract numerous noun phrases, Attend-and-Excite (Chefer et al 2023) strength attention activations associated with the most marginalized subject token.…”

Section: Introductionmentioning

confidence: 99%

Progressive Text-to-Image Diffusion with Soft Latent Direction

Ye,

Cai,

Zhou

et al. 2024

AAAI

View full text Add to dashboard Cite

In spite of the rapidly evolving landscape of text-to-image generation, the synthesis and manipulation of multiple entities while adhering to specific relational constraints pose enduring challenges. This paper introduces an innovative progressive synthesis and editing operation that systematically incorporates entities into the target image, ensuring their adherence to spatial and relational constraints at each sequential step. Our key insight stems from the observation that while a pre-trained text-to-image diffusion model adeptly handles one or two entities, it often falters when dealing with a greater number. To address this limitation, we propose harnessing the capabilities of a Large Language Model (LLM) to decompose intricate and protracted text descriptions into coherent directives adhering to stringent formats. To facilitate the execution of directives involving distinct semantic operations—namely insertion, editing, and erasing—we formulate the Stimulus, Response, and Fusion (SRF) framework. Within this framework, latent regions are gently stimulated in alignment with each operation, followed by the fusion of the responsive latent components to achieve cohesive entity manipulation. Our proposed framework yields notable advancements in object synthesis, particularly when confronted with intricate and lengthy textual inputs. Consequently, it establishes a new benchmark for text-to-image generation tasks, further elevating the field's performance standards.

show abstract

“…text-guided solutions have emerged in the field of image editing and produced impressive results [21,25,40,7,17]. The powerful generative capabilities of diffusion models enable the generation of numerous high-quality images.…”

Section: Layered Controlled Optimization Fine-tuningmentioning

confidence: 99%

“…These models can generate high-quality synthetic images based on text prompts, enabling text-guided image editing and producing impressive results. As a result, numerous text-based image editing methods [36,13,10,7,28,8,35] have emerged and evolved. However, such models cannot mimic specific subject characteristics.…”

Section: Introductionmentioning

confidence: 99%

Building Chinese interdisciplinary research centers : the case of Tsinghua University

Li¹,

李蓉暉²

View full text Add to dashboard Cite

Graph machine learning has been extensively studied in both academia and industry. However, in the literature, most existing graph machine learning models are designed to conduct training with data samples in a random order, which may suffer from suboptimal performance due to ignoring the importance of different graph data samples and their training orders for the model optimization status. To tackle this critical problem, curriculum graph machine learning (Graph CL), which integrates the strength of graph machine learning and curriculum learning, arises and attracts an increasing amount of attention from the research community. Therefore, in this paper, we comprehensively overview approaches on Graph CL and present a detailed survey of recent advances in this direction. Specifically, we first discuss the key challenges of Graph CL and provide its formal problem definition. Then, we categorize and summarize existing methods into three classes based on three kinds of graph machine learning tasks, i.e., node-level, linklevel, and graph-level tasks. Finally, we share our thoughts on future research directions. To the best of our knowledge, this paper is the first survey for curriculum graph machine learning.

show abstract

Attend-and-Excite: Attention-Based Semantic Guidance for Text-to-Image Diffusion Models

Cited by 6 publications

References 34 publications

Cones: Concept Neurons in Diffusion Models for Customized Generation

Cones: Concept Neurons in Diffusion Models for Customized Generation

Progressive Text-to-Image Diffusion with Soft Latent Direction

Building Chinese interdisciplinary research centers : the case of Tsinghua University

Contact Info

Product

Resources

About