2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2021
DOI: 10.1109/iccv48922.2021.00160
|View full text |Cite
|
Sign up to set email alerts
|

Beyond Question-Based Biases: Assessing Multimodal Shortcut Learning in Visual Question Answering

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
13
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
4
1

Relationship

1
8

Authors

Journals

citations
Cited by 44 publications
(22 citation statements)
references
References 27 publications
0
13
0
Order By: Relevance
“…Deep neural networks often solve the task-specific problem, e.g., image classification, by learning the shortcuts such as the correlations of cows and grass instead of the intended solution, e.g., the features from cows [8]. Recently, the shortcut in deep learning models gains increasing attention across the deep learning field from computer vision (CV) [3,32,55], natural language processing (NLP) [31,36] to reinforcement learning [1]. To date, various methods have been devised to mitigate the negative effects of shortcuts [27].…”
Section: Shortcut Learningmentioning
confidence: 99%
“…Deep neural networks often solve the task-specific problem, e.g., image classification, by learning the shortcuts such as the correlations of cows and grass instead of the intended solution, e.g., the features from cows [8]. Recently, the shortcut in deep learning models gains increasing attention across the deep learning field from computer vision (CV) [3,32,55], natural language processing (NLP) [31,36] to reinforcement learning [1]. To date, various methods have been devised to mitigate the negative effects of shortcuts [27].…”
Section: Shortcut Learningmentioning
confidence: 99%
“…Recent years have shown rapid developments in the field of multimodal machine learning [2]. Neural architectures are employed in tasks that go beyond single modalities, for example, Visual Question Answering (VQA) [12], Visual Commonsense Reasoning (VCR) [46], etc. In these tasks and beyond, priors and features from different modalities are required and algorithms or deep networks cannot be effective when provided with only a single modality.…”
Section: Multimodal Learningmentioning
confidence: 99%
“…Shortcut learning Recently, shortcut learning has received much attention in deep learning areas such as computer vision (CV) [9,43,34] and natural language processing (NLP) [12,37,39]. For most of the tasks in deep learning, both the training and test sets come from the same dataset.…”
Section: Related Workmentioning
confidence: 99%