Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing 2016
DOI: 10.18653/v1/d16-1092
|View full text |Cite
|
Sign up to set email alerts
|

Human Attention in Visual Question Answering: Do Humans and Deep Networks look at the same regions?

Abstract: We conduct large-scale studies on 'human attention' in Visual Question Answering (VQA) to understand where humans choose to look to answer questions about images. We design and test multiple game-inspired novel attention-annotation interfaces that require the subject to sharpen regions of a blurred image to answer a question. Thus, we introduce the VQA-HAT (Human ATtention) dataset. We evaluate attention maps generated by state-of-the-art VQA models against human attention both qualitatively (via visualization… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

4
369
1
2

Year Published

2017
2017
2022
2022

Publication Types

Select...
5
2
2
1

Relationship

0
10

Authors

Journals

citations
Cited by 225 publications
(376 citation statements)
references
References 16 publications
4
369
1
2
Order By: Relevance
“…We evaluate the proposed GCA methods and have provided both quantitative analysis and qualitative analysis. The former includes: i) Ablation analysis of proposed models (Section-VII-B1), ii) Analysis of uncertainty effect on answer predictions ( Figure-7 (a,b)), iii) Differences of Top-2 softmax scores for answers for some representative questions ( Figure-7 (c,d)) and iv) Comparison of attention map of our proposed uncertainty model against other variants using Rank correlation (RC) and Earth Mover Distance (EMD) [45] as shown in Table-IV for VQA-HAT [34] and in Table-III for VQA-X [46] . Finally, we compare PGCA with state of the art methods, as mentioned in Section-VII-D.…”
Section: Methodsmentioning
confidence: 99%
“…We evaluate the proposed GCA methods and have provided both quantitative analysis and qualitative analysis. The former includes: i) Ablation analysis of proposed models (Section-VII-B1), ii) Analysis of uncertainty effect on answer predictions ( Figure-7 (a,b)), iii) Differences of Top-2 softmax scores for answers for some representative questions ( Figure-7 (c,d)) and iv) Comparison of attention map of our proposed uncertainty model against other variants using Rank correlation (RC) and Earth Mover Distance (EMD) [45] as shown in Table-IV for VQA-HAT [34] and in Table-III for VQA-X [46] . Finally, we compare PGCA with state of the art methods, as mentioned in Section-VII-D.…”
Section: Methodsmentioning
confidence: 99%
“…In recent years, some researchers have made different proposals to contract RNNs with a memory module or an attention module. Proposals include the Neural Turing Machine [131] and Attention Network [132], which have yielded excellent performance on standard question-answering and video-storying tasks [133,134].…”
Section: An Overview Of Deep Learningmentioning
confidence: 99%
“…For example, though neural networks were previously thought by many to be inscrutable, 16 new research suggests this may be actually possible at some point. 12,49 If successful, this might give to rise to the ability to interpret networks learned by neuromorphic chips.…”
Section: Leveraging the Distinctiveness Of Hpc As An Opportunitymentioning
confidence: 99%