2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022
DOI: 10.1109/cvpr52688.2022.01754
|View full text |Cite
|
Sign up to set email alerts
|

HairCLIP: Design Your Hair by Text and Reference Image

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
38
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
3
2
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 52 publications
(43 citation statements)
references
References 37 publications
0
38
0
Order By: Relevance
“…Nevertheless, it still heavily relies on pre-trained generative models. HairCLIP [23] achieves disentangled hair editing by feeding separate hairstyle and hair color information into different sub hair mappers to map the input conditions into corresponding latent code changes.…”
Section: Text-driven Image Manipulationmentioning
confidence: 99%
“…Nevertheless, it still heavily relies on pre-trained generative models. HairCLIP [23] achieves disentangled hair editing by feeding separate hairstyle and hair color information into different sub hair mappers to map the input conditions into corresponding latent code changes.…”
Section: Text-driven Image Manipulationmentioning
confidence: 99%
“…However, due to the advantage of ViT on pre-training, the ViT encoder outperformed the CNN encoder, and it is most commonly applied in other works [9]. CLIP has recently been used in various tasks, such as e-commerce image retrieval [9], text-image generation [10], and image segmentation [18]. However, CLIP still lacks the ability to effectively match local information in images to their descriptions in cross-modal information retrieval tasks [12].…”
Section: The Network Of Clipmentioning
confidence: 99%
“…Recently, Radford et al [8] proposed a Contrastive Language-Image Pre-training network (CLIP) which achieved state-of-the-art performance in cross-modal tasks. CLIP employs the pre-trained GPT-2 and Visual Transformer (ViT) to encode descriptions and images into a shared embedding space respectively [8], and has been widely applied to various information retrieval tasks such as e-commerce image retrieval [9], and text-image generation [10]. One of CLIP's limitations is that it cannot identify relations between objects in images.…”
Section: Introductionmentioning
confidence: 99%
“…With the successful development of cross-modal visual and linguistic representations [30,42,43,54], especially the omnipotent CLIP [35], many efforts [7, 18,23,34,46,49,51] have recently started investigating text-driven image manipulation. However, there are no existing methods specifically for image restoration.…”
Section: Text-driven Image Manipulationmentioning
confidence: 99%
“…However, there are no existing methods specifically for image restoration. Among these works, the most relevant ones are StyleCLIP [34], HairCLIP [49], and CLIP-Styler [23]. StyleCLIP performs attribute manipulation with exploring learned latent space of StyleGANv2 [21].…”
Section: Text-driven Image Manipulationmentioning
confidence: 99%