2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2016
DOI: 10.1109/cvpr.2016.500
|View full text |Cite
|
Sign up to set email alerts
|

Ask Me Anything: Free-Form Visual Question Answering Based on Knowledge from External Sources

Abstract: We propose a method for visual question answering which combines an internal representation of the content of an image with information extracted from a general knowledge base to answer a broad range of image-based questions. This allows more complex questions to be answered using the predominant neural network-based approach than has previously been possible. It particularly allows questions to be asked about the contents of an image, even when the image itself does not contain the whole answer. The method co… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
214
0

Year Published

2017
2017
2020
2020

Publication Types

Select...
5
4
1

Relationship

1
9

Authors

Journals

citations
Cited by 299 publications
(226 citation statements)
references
References 29 publications
1
214
0
Order By: Relevance
“…A number of recent works have proposed visual question answering datasets [3,22,26,31,10,46,38,36] and models [9,25,2,43,24,27,47,45,44,41,35,20,29,15,42,33,17]. Our work builds on top of the VQA dataset from Antol et al [3], which is one of the most widely used VQA datasets.…”
Section: Related Workmentioning
confidence: 99%
“…A number of recent works have proposed visual question answering datasets [3,22,26,31,10,46,38,36] and models [9,25,2,43,24,27,47,45,44,41,35,20,29,15,42,33,17]. Our work builds on top of the VQA dataset from Antol et al [3], which is one of the most widely used VQA datasets.…”
Section: Related Workmentioning
confidence: 99%
“…Specifically, VQA takes an image and a corresponding natural language question as input and outputs the answer. It is a classification problem in which candidate answers are restricted to the most common answers appearing in the dataset and requires deep analysis and understanding of images and questions such as image recognition and object localization [16,27,38,42]. Current models can be classified into three main categories: early fusion models, later fusion models, and external knowledge-based models.…”
Section: Related Workmentioning
confidence: 99%
“…Several researchers employed commonsense knowledge to enrich high-level understanding tasks such as visual ques- Figure 2: (a) Example of questions that require explicit external knowledge [35], (b) Example where knowledge helps [37]. (c) Ways to integrate background knowledge: i) Pre-process knowledge and augment input [1]; ii) Incorporate knowledge as embeddings [36]; iii) Post-processing using explicit reasoning mechanism [2]; iv) Using knowledge graph to influence NN architecture [24].…”
Section: High-level Common-sense Knowledgementioning
confidence: 99%