2024
DOI: 10.3390/app14031169
|View full text |Cite
|
Sign up to set email alerts
|

VL-Few: Vision Language Alignment for Multimodal Few-Shot Meta Learning

Han Ma,
Baoyu Fan,
Benjamin K. Ng
et al.

Abstract: Complex tasks in the real world involve different modal models, such as visual question answering (VQA). However, traditional multimodal learning requires a large amount of aligned data, such as image text pairs, and constructing a large amount of training data is a challenge for multimodal learning. Therefore, we propose VL-Few, which is a simple and effective method to solve the multimodal few-shot problem. VL-Few (1) proposes the modal alignment, which aligns visual features into language space through a li… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Publication Types

Select...

Relationship

0
0

Authors

Journals

citations
Cited by 0 publications
references
References 40 publications
(52 reference statements)
0
0
0
Order By: Relevance

No citations

Set email alert for when this publication receives citations?