"This is my unicorn, Fluffy": Personalizing frozen vision-language representations

Cohen, Niv; Gal, Ran; Meirom, Eli A.; Chechik, Gal; Atzmon, Yuval

doi:10.48550/arxiv.2204.01694

Cited by 2 publications

(2 citation statements)

References 56 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This suggests that the coarse-to-fine ontology evolution serve as a good curriculum of training, a topic widely studied in cognition, behavior and psychology [43,57,67]. Future work should investigate this, and favorably study efficiently finetuning algorithms [13,30,38,46,87]. Lastly, we explore LECO on well-known benchmarks with static data distributions over time periods; in the future, we would like to embrace temporal shifts in the data distribution as well [49].…”

Section: Discussionmentioning

confidence: 99%

Learning with an Evolving Class Ontology

Zhang¹,

Pathak²,

Wang³

et al. 2022

Preprint

View full text Add to dashboard Cite

Lifelong learners must recognize concept vocabularies that evolve over time. A common yet underexplored scenario is learning with class labels that continually refine/expand old classes. For example, humans learn to recognize dog before dog breeds. In practical settings, dataset versioning often introduces refinement to ontologies, such as autonomous vehicle benchmarks that refine a previous vehicle class into school-bus as autonomous operations expand to new cities. This paper formalizes a protocol for studying the problem of Learning with Evolving Class Ontology (LECO). LECO requires learning classifiers in distinct time periods (TPs); each TP introduces a new ontology of "fine" labels that refines old ontologies of "coarse" labels (e.g., dog breeds that refine the previous dog). LECO explores such questions as whether to annotate new data or relabel the old, how to exploit coarse labels, and whether to finetune the previous TP's model or train from scratch. To answer these questions, we leverage insights from related problems such as class-incremental learning. We validate them under the LECO protocol through the lens of image classification (on CIFAR and iNaturalist) and semantic segmentation (on Mapillary). Extensive experiments lead to some surprising conclusions; while the current status quo in the field is to relabel existing datasets with new class ontologies (such as COCO-to-LVIS or Mapillary1.2-to-2.0), LECO demonstrates that a far better strategy is to annotate new data with the new ontology. However, this produces an aggregate dataset with inconsistent old-vs-new labels, complicating learning. To address this challenge, we adopt methods from semi-supervised and partial-label learning. We demonstrate that such strategies can surprisingly be made near-optimal, in the sense of approaching an "oracle" that learns on the aggregate dataset exhaustively labeled with the newest ontology.

show abstract

Section: Discussionmentioning

confidence: 99%

Learning with an Evolving Class Ontology

Zhang¹,

Pathak²,

Wang³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Cohen et al [7] looks at personalizing CLIP for specific users and rare queries, but does not build 3D spatial representations conducive to robotics applications, and instead functions on the level of individual images.…”

Section: Related Workmentioning

confidence: 99%

CLIP-Fields: Weakly Supervised Semantic Fields for Robotic Memory

(Mahi)Shafiullah,

Paxton,

Pinto

et al. 2023

Robotics: Science and Systems XIX

View full text Add to dashboard Cite

We propose CLIP-Fields, an implicit scene model that can be used for a variety of tasks, such as segmentation, instance identification, semantic search over space, and view localization. CLIP-Fields learns a mapping from spatial locations to semantic embedding vectors. Importantly, we show that this mapping can be trained with supervision coming only from webimage and web-text trained models such as CLIP, Detic, and Sentence-BERT; and thus uses no direct human supervision. When compared to baselines like Mask-RCNN, our method outperforms on few-shot instance identification or semantic segmentation on the HM3D dataset with only a fraction of the examples. Finally, we show that using CLIP-Fields as a scene memory, robots can perform semantic navigation in real-world environments. Our code and demonstration videos are available here: https://mahis.life/clip-fields

show abstract

"This is my unicorn, Fluffy": Personalizing frozen vision-language representations

Cited by 2 publications

References 56 publications

Learning with an Evolving Class Ontology

Learning with an Evolving Class Ontology

CLIP-Fields: Weakly Supervised Semantic Fields for Robotic Memory

Contact Info

Product

Resources

About