2023
DOI: 10.48550/arxiv.2301.06267
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Multimodality Helps Unimodality: Cross-Modal Few-Shot Learning with Multimodal Models

Abstract: The ability to quickly learn a new task with minimal instruction -known as few-shot learning -is a central aspect of intelligent agents. Classical few-shot benchmarks make use of few-shot samples from a single modality, but such samples may not be sufficient to characterize an entire concept class. In contrast, humans use cross-modal information to learn new concepts efficiently. In this work, we demonstrate that one can indeed build a better visual dog classifier by reading about dogs and listening to them ba… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(1 citation statement)
references
References 66 publications
0
1
0
Order By: Relevance
“…Our proposed Treff adapter outperforms the TIP-adapter by 0.71 percentage points in terms of accuracy, indicating that CALM learns task-specific knowledge while preserving the knowledge from CLAP. Table 3 compares the proposed Treff-adapter with other cross-modality few-shot methods on the ImageNet-ESC [12]. It can be observed that the Treff adapter and the TIP-adapter outperform the cross-modality few-shot learning by a large margin as they are able to make use of zero-shot knowledge transferring explicitly while the crossmodality FSL discards it gradually in the parameter optimisation.…”
Section: Model Esc-50 Fsdkaggle18kmentioning
confidence: 99%
“…Our proposed Treff adapter outperforms the TIP-adapter by 0.71 percentage points in terms of accuracy, indicating that CALM learns task-specific knowledge while preserving the knowledge from CLAP. Table 3 compares the proposed Treff-adapter with other cross-modality few-shot methods on the ImageNet-ESC [12]. It can be observed that the Treff adapter and the TIP-adapter outperform the cross-modality few-shot learning by a large margin as they are able to make use of zero-shot knowledge transferring explicitly while the crossmodality FSL discards it gradually in the parameter optimisation.…”
Section: Model Esc-50 Fsdkaggle18kmentioning
confidence: 99%