2022
DOI: 10.48550/arxiv.2204.01694
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

"This is my unicorn, Fluffy": Personalizing frozen vision-language representations

Abstract: Large Vision & Language models pretrained on web-scale data provide representations that are invaluable for numerous V&L problems. However, it is unclear how they can be used for reasoning about user-specific visual concepts in unstructured language. This problem arises in multiple domains, from personalized image retrieval to personalized interaction with smart devices. We introduce a new learning setup called Personalized Vision & Language (PerVL) with two new benchmark datasets for retrieving and segmenting… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
1
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 56 publications
0
1
0
Order By: Relevance
“…This suggests that the coarse-to-fine ontology evolution serve as a good curriculum of training, a topic widely studied in cognition, behavior and psychology [43,57,67]. Future work should investigate this, and favorably study efficiently finetuning algorithms [13,30,38,46,87]. Lastly, we explore LECO on well-known benchmarks with static data distributions over time periods; in the future, we would like to embrace temporal shifts in the data distribution as well [49].…”
Section: Discussionmentioning
confidence: 99%
“…This suggests that the coarse-to-fine ontology evolution serve as a good curriculum of training, a topic widely studied in cognition, behavior and psychology [43,57,67]. Future work should investigate this, and favorably study efficiently finetuning algorithms [13,30,38,46,87]. Lastly, we explore LECO on well-known benchmarks with static data distributions over time periods; in the future, we would like to embrace temporal shifts in the data distribution as well [49].…”
Section: Discussionmentioning
confidence: 99%
“…Cohen et al [7] looks at personalizing CLIP for specific users and rare queries, but does not build 3D spatial representations conducive to robotics applications, and instead functions on the level of individual images.…”
Section: Related Workmentioning
confidence: 99%