2016
DOI: 10.1007/s11263-016-0966-6
|View full text |Cite
|
Sign up to set email alerts
|

VQA: Visual Question Answering

Abstract: Abstract-We propose the task of free-form and open-ended Visual Question Answering (VQA). Given an image and a natural language question about the image, the task is to provide an accurate natural language answer. Mirroring real-world scenarios, such as helping the visually impaired, both the questions and answers are open-ended. Visual questions selectively target different areas of an image, including background details and underlying context. As a result, a system that succeeds at VQA typically needs a more… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
177
0
1

Year Published

2018
2018
2023
2023

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 306 publications
(195 citation statements)
references
References 39 publications
0
177
0
1
Order By: Relevance
“…respectively. 3 We begin from a seed set of 250 manually constructed patterns, and extend it with 274 natural patterns derived from VQA1.0 [4] through templatization of words from our ontology. 4 To increase the question diversity, apart from using synonyms for objects and attributes, we incorporate probabilistic sections into the patterns, such as optional phrases [x] and alternate expressions (x|y), which get instantiated at random.…”
Section: The Question Enginementioning
confidence: 99%
See 2 more Smart Citations
“…respectively. 3 We begin from a seed set of 250 manually constructed patterns, and extend it with 274 natural patterns derived from VQA1.0 [4] through templatization of words from our ontology. 4 To increase the question diversity, apart from using synonyms for objects and attributes, we incorporate probabilistic sections into the patterns, such as optional phrases [x] and alternate expressions (x|y), which get instantiated at random.…”
Section: The Question Enginementioning
confidence: 99%
“…For VQA1.0, blind models achieve 50% in accuracy without even considering the images whatsoever[4]. Similarly, for VQA2.0, 67% and 27% of the binary and open questions respectively are answered correctly by such models[11].…”
mentioning
confidence: 94%
See 1 more Smart Citation
“…To further investigate the relevance of our findings to biological visual systems, in follow-up work we intend to deploy our modulation scheme on architectures that bear more similarity to the primate visual hierarchy, such as deep convolutional networks (Kriegeskorte, 2015), datasets of naturalistic images such as ImageNet (Russakovsky et al, 2015), and general naturalistic tasks such as visual question answering (Agrawal et al, 2017). This will allow us to assess whether the functional advantage provided by early modulation holds true in a more realistic scenario, and whether the resulting modulation schemes resemble those observed in the early visual areas of the primate brain.…”
Section: Discussionmentioning
confidence: 99%
“…Next, we verify the applicability of the 3-D scene graph by demonstrating two major applications: 1) visual question and answering (VQA) and 2) task planning. The two applications are under active research in computer vision [5], NLP [6], and robotics societies [7].…”
mentioning
confidence: 99%