2019
DOI: 10.1002/pra2.7
|View full text |Cite
|
Sign up to set email alerts
|

Dataset bias: A case study for visual question answering

Abstract: We examine the issue of bias in datasets designed to train visual question answering (VQA) algorithms. These datasets include a collection of natural language questions about images (aka ‐ visual questions). We consider three popular datasets that are captured by people with sight, people who are blind, and generated by computers. We first demonstrate that machine learning algorithms can be trained to recognize each dataset's bias, and so determine the source of a novel visual question. We then discuss potenti… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
15
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
4
3

Relationship

1
6

Authors

Journals

citations
Cited by 9 publications
(15 citation statements)
references
References 35 publications
0
15
0
Order By: Relevance
“…For example, bias has been analyzed for commercial video clips [36], automated facial analysis technologies [48], algorithms that predict whether a convicted criminal will re-offend [54], loan lending algorithms [45], and LabInTheWild studies [8]. More similar to our work, it has been shown that it is possible, for a given visual question, to identify which dataset it belongs to from a known collection of VQA datasets [16,19]. Similar to prior work, our findings expose biases in datasets, and so underscore important next 149:4 Xiaoyu Zeng et al…”
Section: Related Workmentioning
confidence: 85%
See 1 more Smart Citation
“…For example, bias has been analyzed for commercial video clips [36], automated facial analysis technologies [48], algorithms that predict whether a convicted criminal will re-offend [54], loan lending algorithms [45], and LabInTheWild studies [8]. More similar to our work, it has been shown that it is possible, for a given visual question, to identify which dataset it belongs to from a known collection of VQA datasets [16,19]. Similar to prior work, our findings expose biases in datasets, and so underscore important next 149:4 Xiaoyu Zeng et al…”
Section: Related Workmentioning
confidence: 85%
“…1 Altogether, our work offers valuable insights to multiple stakeholders -creators of computer vision datasets, designers of VQA systems, and users of VQA systems. It is broadly-known that computer vision datasets can embed flawed assumptions that lead to perpetuating and amplifying biases in technology, including for VQA [16,19]. Yet, the types of biases can be unknown to those developing the AI.…”
Section: :3mentioning
confidence: 99%
“…In the study of [32] The detection of bias in developed models enables the recognition of whether questions are answered by a human with normal vision, a blind person, or a robot. It does so by utilizing visual question answering (VQA) datasets cataloged by individuals with normal vision, blind people, and a robot.…”
Section: Mitigation Techniques and Modelsmentioning
confidence: 99%
“…As a result of annotation artifacts, existing NLP datasets contain shallow patterns that correlate with target labels (Gururangan et al, 2018;McCoy et al, 2019;Schuster et al, 2019a;Le Bras et al, 2020;Jia and Liang, 2017;Das et al, 2019). Models tend to exploit these shallow patterns-which we refer to as biases in this paper-instead of learning general knowledge about solving the target task.…”
Section: Introductionmentioning
confidence: 99%