Inversion-based Style Transfer with Diffusion Models

Zhang, Yuxin; Huang, Nisha; Tang, Fan; Huang, Haibin; Ma, Chongyang; Dong, Weiming; Xu, Changsheng

doi:10.1109/cvpr52729.2023.00978

Cited by 57 publications

(5 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Generative models, including generative adversarial networks (GANs) [19], variational auto-encoders (VAEs) [28], are now integral to AI-supported design processes (e.g., [6,45]). Simultaneously, diffusion models, renowned for their ability to produce rich and diverse samples, have found applications in areas such as artistic style transfer [73]. With its fast-evolving capabilities, AI has been playing an increasing role in human-AI collaboration for creative design [13,44].…”

Section: Human-ai Collaboration For Creative Designmentioning

confidence: 99%

C2Ideas: Supporting Creative Interior Color Design Ideation with a Large Language Model

Hou,

Yang,

Cui

et al. 2024

Proceedings of the CHI Conference on Human Factors in Computing Systems

View full text Add to dashboard Cite

Interior color design is a creative process that endeavors to allocate colors to furniture and other elements within an interior space.While much research focuses on generating realistic interior designs, these automated approaches often misalign with user intention and disregard design rationales. Informed by a need-finding preliminary study, we develop C2Ideas, an innovative system for designers to creatively ideate color schemes enabled by an intent-aligned and domain-oriented large language model. C2Ideas integrates a three-stage process: Idea Prompting stage distills user intentions into color linguistic prompts; Word-Color Association stage transforms the prompts into semantically and stylistically coherent color schemes; and Interior Coloring stage assigns colors to interior elements complying with design principles. We also develop an interactive interface that enables flexible user refinement and interpretable reasoning. C2Ideas has undergone a series of indoor cases and user studies, demonstrating its effectiveness and high recognition of interactive functionality by designers.CCS Concepts: • Human-centered computing → User interface toolkits.

show abstract

Section: Human-ai Collaboration For Creative Designmentioning

confidence: 99%

C2Ideas: Supporting Creative Interior Color Design Ideation with a Large Language Model

Hou,

Yang,

Cui

et al. 2024

Proceedings of the CHI Conference on Human Factors in Computing Systems

View full text Add to dashboard Cite

show abstract

“…Following the previous works on music style transfer (Alinoori and Tzerpos 2022; Cífka et al 2021), we evaluate our method based on two criteria: (a) content preservation and (b) style fit. Taking inspiration from MUSIC-GEN (Copet et al 2023) and InST (Zhang et al 2023b), we compute the CLAP cosine similarity between the generated mel-spectrograms and the content mel-spectrograms to evaluate content preservation. Additionally, we calculate the CLAP cosine similarity between the generated melspectrograms and the corresponding textual description of the style to evaluate style fit.…”

Section: Quantitative Evaluationmentioning

confidence: 99%

Music Style Transfer with Time-Varying Inversion of Diffusion Models

Li,

Zhang,

Tang

et al. 2024

AAAI

View full text Add to dashboard Cite

With the development of diffusion models, text-guided image style transfer has demonstrated great controllable and high-quality results. However, the utilization of text for diverse music style transfer poses significant challenges, primarily due to the limited availability of matched audio-text datasets. Music, being an abstract and complex art form, exhibits variations and intricacies even within the same genre, thereby making accurate textual descriptions challenging. This paper presents a music style transfer approach that effectively captures musical attributes using minimal data. We introduce a novel time-varying textual inversion module to precisely capture mel-spectrogram features at different levels. During inference, we utilize a bias-reduced stylization technique to get stable results. Experimental results demonstrate that our method can transfer the style of specific instruments, as well as incorporate natural sounds to compose melodies. Samples and code are available at https://lsfhuihuiff.github.io/MusicTI/.

show abstract

“…The model requires retraining for different sets of images. Zhang et al [13] adopt a similar approach and apply the text inversion method to image style transfer. However, retraining the network for each style can be time-consuming.…”

Section: Inversion Of the Diffusion Modelmentioning

confidence: 99%

“…Recently, with the development of Diffusion Models (DMs) [7][8][9][10], there have been a few attempts to use DMs to render content images via textual style conditions. With their outstanding ability to produce rich stylizations, many DMs-based methods [11][12][13] produce high quality results. Furthermore, text-driven image stylization is feasible through image editing techniques [14][15][16][17].…”

Section: Introductionmentioning

confidence: 99%

ConIS: Controllable Text-Driven Image Stylization with Semantic Intensity

Yang,

Li,

Zhang

2024

Preprint

View full text Add to dashboard Cite

Text-driven image stylization aims to synthesize content images with learned textual styles. Recent studies have shown the potential of the diffusion model for producing rich stylizations. However, existing approaches inefficiently control the degree of stylization, which hinders the balance between style and content in generated images. In this paper, we propose a Controllable Text-Driven Image Stylization (ConIS) Framework based on the diffusion model. The proposed framework introduces two modules into the pre-trained text-to-image model. The first is an Unconditional Null-text Inversion (UNTI) module, which optimizes null-text embedding to reduce the bias between inversion and sampling in the diffusion model. Given a content image, this module is able to reconstruct it without semantic guidance. The second is a Null-text Dilution (NTD) module. We design a parameterization mechanism for the semantic intensity of textual conditions, which indirectly controls the degree of stylization through the style degree factor. Finally, we replace the attention maps used in the sampling process with those from the UNTI module to constrain the structure of content images. Experiments have shown that the proposed method enables fine-grained control over the degree of stylization without retraining or fine-tuning the network. Both qualitative and quantitative results indicate that the ConIS framework outperforms state-of-the-art methods in balancing artistic detail and content structure.

show abstract

Inversion-based Style Transfer with Diffusion Models

Cited by 57 publications

References 24 publications

C2Ideas: Supporting Creative Interior Color Design Ideation with a Large Language Model

C2Ideas: Supporting Creative Interior Color Design Ideation with a Large Language Model

Music Style Transfer with Time-Varying Inversion of Diffusion Models

ConIS: Controllable Text-Driven Image Stylization with Semantic Intensity

Contact Info

Product

Resources

About