2023
DOI: 10.48550/arxiv.2302.06232
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Understanding Multimodal Contrastive Learning and Incorporating Unpaired Data

Abstract: Language-supervised vision models have recently attracted great attention in computer vision. A common approach to build such models is to use contrastive learning on paired data across the two modalities, as exemplified by Contrastive Language-Image Pre-Training (CLIP). In this paper, under linear representation settings, (i) we initiate the investigation of a general class of nonlinear loss functions for multimodal contrastive learning (MMCL) including CLIP loss and show its connection to singular value deco… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 54 publications
0
1
0
Order By: Relevance
“…In this section, we briefly discuss how to reduce the complexity of the class C in some cases. We use tools from representation learning recently developed by the machine learning community [44,24,23,36,49], which was used to improve generalization [27, 47,45,48,46], robustness [9], and fairness [8,5]. Indeed, in our applications, if C is a very large class, it will result in the inefficiency of Algorithm 1 since we need to search for c t in a rich class at each iteration.…”
Section: :23mentioning
confidence: 99%
“…In this section, we briefly discuss how to reduce the complexity of the class C in some cases. We use tools from representation learning recently developed by the machine learning community [44,24,23,36,49], which was used to improve generalization [27, 47,45,48,46], robustness [9], and fairness [8,5]. Indeed, in our applications, if C is a very large class, it will result in the inefficiency of Algorithm 1 since we need to search for c t in a rich class at each iteration.…”
Section: :23mentioning
confidence: 99%