HairCLIP: Design Your Hair by Text and Reference Image

Wei, Tianyi; Chen, Dongdong; Zhou, Wenbo; Liao, Jing; Tan, Zhentao; Yuan, Lu; Zhang, Weimin; Yu, Nenghai

doi:10.1109/cvpr52688.2022.01754

Cited by 52 publications

(43 citation statements)

References 37 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Nevertheless, it still heavily relies on pre-trained generative models. HairCLIP [23] achieves disentangled hair editing by feeding separate hairstyle and hair color information into different sub hair mappers to map the input conditions into corresponding latent code changes.…”

Section: Text-driven Image Manipulationmentioning

confidence: 99%

ITstyler: Image-optimized Text-based Style Transfer

Bai¹,

Liu²,

Chen³

et al. 2023

Preprint

View full text Add to dashboard Cite

show abstract

Section: Text-driven Image Manipulationmentioning

confidence: 99%

ITstyler: Image-optimized Text-based Style Transfer

Bai¹,

Liu²,

Chen³

et al. 2023

Preprint

View full text Add to dashboard Cite

show abstract

“…However, due to the advantage of ViT on pre-training, the ViT encoder outperformed the CNN encoder, and it is most commonly applied in other works [9]. CLIP has recently been used in various tasks, such as e-commerce image retrieval [9], text-image generation [10], and image segmentation [18]. However, CLIP still lacks the ability to effectively match local information in images to their descriptions in cross-modal information retrieval tasks [12].…”

Section: The Network Of Clipmentioning

confidence: 99%

“…Recently, Radford et al [8] proposed a Contrastive Language-Image Pre-training network (CLIP) which achieved state-of-the-art performance in cross-modal tasks. CLIP employs the pre-trained GPT-2 and Visual Transformer (ViT) to encode descriptions and images into a shared embedding space respectively [8], and has been widely applied to various information retrieval tasks such as e-commerce image retrieval [9], and text-image generation [10]. One of CLIP's limitations is that it cannot identify relations between objects in images.…”

Section: Introductionmentioning

confidence: 99%

Clip-Rr: Improved Clip Network for Relation-Focused Cross-Modal Information Retrieval

Yan

Cosma

2023

Preprint

View full text Add to dashboard Cite

“…With the successful development of cross-modal visual and linguistic representations [30,42,43,54], especially the omnipotent CLIP [35], many efforts [7, 18,23,34,46,49,51] have recently started investigating text-driven image manipulation. However, there are no existing methods specifically for image restoration.…”

Section: Text-driven Image Manipulationmentioning

confidence: 99%

“…However, there are no existing methods specifically for image restoration. Among these works, the most relevant ones are StyleCLIP [34], HairCLIP [49], and CLIP-Styler [23]. StyleCLIP performs attribute manipulation with exploring learned latent space of StyleGANv2 [21].…”

Section: Text-driven Image Manipulationmentioning

confidence: 99%