2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020
DOI: 10.1109/cvpr42600.2020.01002
|View full text |Cite
|
Sign up to set email alerts
|

SQuINTing at VQA Models: Introspecting VQA Models With Sub-Questions

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
23
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
5
2
2

Relationship

0
9

Authors

Journals

citations
Cited by 31 publications
(30 citation statements)
references
References 15 publications
0
23
0
Order By: Relevance
“…A crucial challenge all these models have to face is the ability to generalize the knowledge learned to unseen data, which can be achieved only if the model is able to compositionally build the multimodal representations, a must for any model of human intelligence (Lake et al., 2017). Since neural‐based VQA models have been shown to produce inconsistent answers to questions that are either similar or mutually exclusive, approaches to mitigate this behaviour have been recently proposed (Ray et al., 2019; Selvaraju et al., 2020). Interestingly, Gandhi and Lake (2020) showed that whilst children are driven by the mutual exclusivity assumption in their learning process, neural networks are not, and set this as an open challenge.…”
Section: The Recent Revival Of Vqamentioning
confidence: 99%
“…A crucial challenge all these models have to face is the ability to generalize the knowledge learned to unseen data, which can be achieved only if the model is able to compositionally build the multimodal representations, a must for any model of human intelligence (Lake et al., 2017). Since neural‐based VQA models have been shown to produce inconsistent answers to questions that are either similar or mutually exclusive, approaches to mitigate this behaviour have been recently proposed (Ray et al., 2019; Selvaraju et al., 2020). Interestingly, Gandhi and Lake (2020) showed that whilst children are driven by the mutual exclusivity assumption in their learning process, neural networks are not, and set this as an open challenge.…”
Section: The Recent Revival Of Vqamentioning
confidence: 99%
“…Using counterfactual images as explanations can also be thought of as the visual equivalent to observing VQA behavior by rephrasing the question and checking if the model responds consistently. [20][21][22] Hence, such counterfactual images hint at how consistent these models are to users, and that aids in their mental model improvement.…”
Section: Related Workmentioning
confidence: 99%
“…They have also been used in an optimization process where 19 proposed a loss function to find the minimum changes in the input that results in a change in the output of a classifier. Using counterfactual images as explanations can also be thought of as the visual equivalent to observing VQA behavior by rephrasing the question and checking if the model responds consistently 20,21,22 . Hence, such counterfactual images hint at how consistent these models are to users, and that aids in their mental model improvement.…”
Section: Related Workmentioning
confidence: 99%