2021
DOI: 10.48550/arxiv.2106.00178
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Language-Driven Image Style Transfer

Abstract: Despite having promising results, style transfer, which requires preparing style images in advance, may result in lack of creativity and accessibility. Following human instruction, on the other hand, is the most natural way to perform artistic style transfer that can significantly improve controllability for visual effect applications. We introduce a new task-language-driven image style transfer (LDIST)-to manipulate the style of a content image, guided by a text. We propose contrastive language visual artist … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(3 citation statements)
references
References 100 publications
(90 reference statements)
0
3
0
Order By: Relevance
“…Multimodal learning has come into prominence recently, with text-to-image synthesis [53,12,57] and image-text contrastive learning [49,31,74] at the forefront. These models have transformed the research community and captured widespread public attention with creative image generation [22,54] and editing applications [21,41,34]. To pursue this research direction further, we introduce Imagen, a text-to-image diffusion model that combines the power of transformer language models (LMs) [15,52] with high-fidelity diffusion models [28,29,16,41] to deliver an unprecedented degree of photorealism and a deep level of language understanding in text-to-image synthesis.…”
Section: Introductionmentioning
confidence: 99%
“…Multimodal learning has come into prominence recently, with text-to-image synthesis [53,12,57] and image-text contrastive learning [49,31,74] at the forefront. These models have transformed the research community and captured widespread public attention with creative image generation [22,54] and editing applications [21,41,34]. To pursue this research direction further, we introduce Imagen, a text-to-image diffusion model that combines the power of transformer language models (LMs) [15,52] with high-fidelity diffusion models [28,29,16,41] to deliver an unprecedented degree of photorealism and a deep level of language understanding in text-to-image synthesis.…”
Section: Introductionmentioning
confidence: 99%
“…Style Transfer. Without requiring training or inversion of generative models, CLVA [196] manipulates the style of a content image through text prompts by comparing the contrastive pairs of content image and style instruction to achieve the mutual relativeness. However, CLVA is constrained as it requires style images accompanied with the text prompts during training.…”
Section: Other Methodsmentioning
confidence: 99%
“…Style Transfer CLVA [165] proposes to manipulate the style of a content image through text guidance, by comparing the contrastive pairs of content image and style instruction to achieve the mutual relativeness. CLIPstyler [166] propose to achieve text guided style transfer by training a lightweight network which transform a content image to follow the text condition by matching the similarity between the CLIP model output.…”
Section: Other Methodsmentioning
confidence: 99%