2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022
DOI: 10.1109/cvpr52688.2022.01814
|View full text |Cite
|
Sign up to set email alerts
|

General Facial Representation Learning in a Visual-Linguistic Manner

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
43
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 62 publications
(52 citation statements)
references
References 57 publications
0
43
0
Order By: Relevance
“…In order to obtain meaningful initial values for the latent codes that will be optimized to create the anonymized version of the real dataset, namely X A , we first pair the real images from the original set (i.e., X R ) with fake ones from the generated dataset (i.e., X F ) in the feature space of the ViT-based FaRL [49] image encoder and use their latent codes for initializing the aforementioned trainable codes. The latent codes of the anonymized dataset are then optimized under the following objectives via two novel loss functions: (a) to be similar to the corresponding real ones, up to a certain margin, using the proposed identity loss (L id ), and (b) to preserve the facial attributes of the corresponding real ones by being pulled closer in the feature space of the pre-trained FaRL [49] image encoder using the proposed attribute preservation loss (L att ). In this way, in contrast to state-of-the-art works [17,28], the anonymized images are optimized to inherit the labels of the original ones.…”
Section: Proposed Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…In order to obtain meaningful initial values for the latent codes that will be optimized to create the anonymized version of the real dataset, namely X A , we first pair the real images from the original set (i.e., X R ) with fake ones from the generated dataset (i.e., X F ) in the feature space of the ViT-based FaRL [49] image encoder and use their latent codes for initializing the aforementioned trainable codes. The latent codes of the anonymized dataset are then optimized under the following objectives via two novel loss functions: (a) to be similar to the corresponding real ones, up to a certain margin, using the proposed identity loss (L id ), and (b) to preserve the facial attributes of the corresponding real ones by being pulled closer in the feature space of the pre-trained FaRL [49] image encoder using the proposed attribute preservation loss (L att ). In this way, in contrast to state-of-the-art works [17,28], the anonymized images are optimized to inherit the labels of the original ones.…”
Section: Proposed Methodsmentioning
confidence: 99%
“…To achieve this, in contrast to existing work [17,22,28,45,46] that train custom neural networks from scratch, we propose to work directly in the latent space of a powerful pre-trained GAN, optimizing the latent codes directly with losses that explicitly aim to retain the attributes and obfuscate the identities. More concretely, we use a deep feature-matching loss [49] to match the high-level semantic features between the original and the fake image generated by the latent code, and a margin-based identity loss to control the similarity between the original and the fake image in the ArcFace [9] space. The initialisation of the latent codes is obtained by randomly sampling the latent space of GAN, using them to generate the corresponding synthetic images and finding the nearest neighbors in a semantic space (FARL [49]).…”
Section: Ciagan Deepprivacy Oursmentioning
confidence: 99%
See 1 more Smart Citation
“…LAION-5B's scale enables novel dataset curation for computer vision related tasks. Recently, researchers have utilized both LAION-5B and a subset, LAION-400M, as a data source in vision related tasks such as facial representation learning [96] and invasive species mitigation [38]. Within LAION, we have compiled from LAION-5B both LAION-High-Resolution 6 , a 170M subset for superresolution models, and LAION-Aesthetic…”
Section: Usage Examplesmentioning
confidence: 99%
“…Third, we apply FaRL for Facial Representation Learning (FaRL) [90] which is a pretrained Transformer model intended for tasks related to facial features. Using the FaRL image encoder we obtain 512-dimensional features for the extracted face frames.…”
Section: Videomentioning
confidence: 99%