2021
DOI: 10.48550/arxiv.2112.05142
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

HairCLIP: Design Your Hair by Text and Reference Image

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(6 citation statements)
references
References 33 publications
(73 reference statements)
0
6
0
Order By: Relevance
“…With the development of the powerful cross-modal visual and language model CLIP [32], many recent efforts start to study CLIP-Guided Image Generation [8,15,19,30,36,37]. CLIPStyler [19] utilizes the well-aligned CLIP latent space for high-quality style transfer.…”
Section: Related Workmentioning
confidence: 99%
“…With the development of the powerful cross-modal visual and language model CLIP [32], many recent efforts start to study CLIP-Guided Image Generation [8,15,19,30,36,37]. CLIPStyler [19] utilizes the well-aligned CLIP latent space for high-quality style transfer.…”
Section: Related Workmentioning
confidence: 99%
“…Otherwise, the similarity is minimized for the model to find the most suitable paired images and texts. Multimodal learning has a promising future where the innovation of CLIP has benefitted a lot of down-stream tasks [10,43]. Other multimodal schemes can be represented by Glide [30] and VilBERT [28] that are respectively for text-to-image generation and multimodal representation learning.…”
Section: Multimodal Learningmentioning
confidence: 99%
“…To achieve zero-shot and open-vocabulary editing, latest works set their sights on using pretrained multi-modality models as guidance. With the aligned image-text representation learned by CLIP, a few works Wei et al [2021], use text to extract the latent edit directions with textual defined semantic meanings for separate input images. These works focus on extracting latent directions using contrastive CLIP loss to conduct image manipulation tasks such as face editing , Wei et al [2021], cars editing .…”
Section: Stylegan-based Editingmentioning
confidence: 99%
“…With the aligned image-text representation learned by CLIP, a few works Wei et al [2021], use text to extract the latent edit directions with textual defined semantic meanings for separate input images. These works focus on extracting latent directions using contrastive CLIP loss to conduct image manipulation tasks such as face editing , Wei et al [2021], cars editing . On the other hand, rather than editing the latent code, in observance of the smoothness of the StyleGAN feature space, Gal et al Gal et al [2021] focus on fine-tuning the latent domain of the generator to transfer the feature domain.…”
Section: Stylegan-based Editingmentioning
confidence: 99%