2022
DOI: 10.48550/arxiv.2212.10537
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Does CLIP Bind Concepts? Probing Compositionality in Large Image Models

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
1
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 0 publications
0
1
0
Order By: Relevance
“…foreground objects, lack compositionality, and do not understand concepts of negation. Research efforts mitigating these shortcomings [73,74] are ripe for exploration. Third, we anticipate ConceptFusion to inherit the limitations and biases of foundation models [5,75], warranting further investigations for potential harm as well as research into AI safety and alignment [76,77].…”
Section: Discussionmentioning
confidence: 99%
“…foreground objects, lack compositionality, and do not understand concepts of negation. Research efforts mitigating these shortcomings [73,74] are ripe for exploration. Third, we anticipate ConceptFusion to inherit the limitations and biases of foundation models [5,75], warranting further investigations for potential harm as well as research into AI safety and alignment [76,77].…”
Section: Discussionmentioning
confidence: 99%
“…A subsequent work (Diwan et al, 2022) shows that Winoground requires not only compositional language understanding but also other abilities such as sophisticated commonsense reasoning and locating small objects in low resolution images, which most vision and language models currently lack. The work (Lewis et al, 2023) is the most relevant to our research, although it primarily deals with toy datasets. Our work also reveals brittleness of vision-language models through the lens of CAB, which has been overlooked in the past.…”
Section: Compositionality In Vision and Language Modelsmentioning
confidence: 99%