2022
DOI: 10.48550/arxiv.2210.03114
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

CLIP model is an Efficient Continual Learner

Abstract: The continual learning setting aims to learn new tasks over time without forgetting the previous ones. The literature reports several significant efforts to tackle this problem with limited or no access to previous task data. Among such efforts, typical solutions offer sophisticated techniques involving memory replay, knowledge distillation, model regularization, and dynamic network expansion. The resulting methods have a retraining cost at each learning task, dedicated memory requirements, and setting-specifi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 29 publications
0
1
0
Order By: Relevance
“…Here, V B denotes the video dataset for task B, T B denotes the corresponding label set, f θ V B denotes the visual encoder, and f θ T B denotes the text encoder. It is worth noting that the text encoder is usually frozen during training [44,45], thus the fine-tuning stage primarily concentrates on the optimization of the visual encoder for adaptation to the video domain. For the sake of brevity, the superscript will be omitted in the subsequent paragraphs.…”
Section: Preliminary: Video Action Recognition Us-ing Clipmentioning
confidence: 99%
“…Here, V B denotes the video dataset for task B, T B denotes the corresponding label set, f θ V B denotes the visual encoder, and f θ T B denotes the text encoder. It is worth noting that the text encoder is usually frozen during training [44,45], thus the fine-tuning stage primarily concentrates on the optimization of the visual encoder for adaptation to the video domain. For the sake of brevity, the superscript will be omitted in the subsequent paragraphs.…”
Section: Preliminary: Video Action Recognition Us-ing Clipmentioning
confidence: 99%