2022
DOI: 10.48550/arxiv.2202.01993
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Grounding Answers for Visual Questions Asked by Visually Impaired People

Abstract: Visual question answering is the task of answering questions about images. We introduce the VizWiz-VQA-Grounding dataset, the first dataset that visually grounds answers to visual questions asked by people with visual impairments. We analyze our dataset and compare it with five VQA-Grounding datasets to demonstrate what makes it similar and different. We then evaluate the SOTA VQA and VQA-Grounding models and demonstrate that current SOTA algorithms often fail to identify the correct visual evidence where the … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
13
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
1
1
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(13 citation statements)
references
References 27 publications
0
13
0
Order By: Relevance
“…This method relies on object detection networks to provide the anchor points, which adds an extra task for the network to learn. Also, using large generic corpora may not improve the accuracy for special datasets such as VizWiz-VQA-Grounding [3]. In contrast, our proposed method relies only on the feature-maps and does not define another task.…”
Section: Comparison To Existing Methodsmentioning
confidence: 95%
See 3 more Smart Citations
“…This method relies on object detection networks to provide the anchor points, which adds an extra task for the network to learn. Also, using large generic corpora may not improve the accuracy for special datasets such as VizWiz-VQA-Grounding [3]. In contrast, our proposed method relies only on the feature-maps and does not define another task.…”
Section: Comparison To Existing Methodsmentioning
confidence: 95%
“…Recently, many applications have been made based on deep neural networks that are expected to be used by the end-users of different platforms. For example, answer grounding methods are helpful in assistive technologies for people with vision impairments [3]. In order to use such technologies offline, it is crucial to implement the method for each platform.…”
Section: Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…Visual Question Answering (VQA) is a VL task that has obtained a fundamental role in the evolution of various interactive VL AI systems, such as Visual Dialogue [10], Text-Image Retrieval [11] and Visual Commonsense Reasoning [12]. To this end, there is an extensive range of real-world applications that benefit significantly from the new advances around the VQA task, such as aiding systems for visually impaired individuals [13,14] and self-driving cars [15].…”
Section: Introductionmentioning
confidence: 99%