2022
DOI: 10.1609/aaai.v36i10.21346
|View full text |Cite
|
Sign up to set email alerts
|

Dynamic Key-Value Memory Enhanced Multi-Step Graph Reasoning for Knowledge-Based Visual Question Answering

Abstract: Knowledge-based visual question answering (VQA) is a vision-language task that requires an agent to correctly answer image-related questions using knowledge that is not presented in the given image. It is not only a more challenging task than regular VQA but also a vital step towards building a general VQA system. Most existing knowledge-based VQA systems process knowledge and image information similarly and ignore the fact that the knowledge base (KB) contains complete information about a triplet, while the e… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3

Citation Types

0
3
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 9 publications
(3 citation statements)
references
References 32 publications
(58 reference statements)
0
3
0
Order By: Relevance
“…Graph Neural Network. Graph neural network (GNN) (Li and Moens 2022;Scarselli et al 2008;Li et al 2019;Gao et al 2020;Zhu et al 2020) is a highly effective framework for representing graph-structured data. GNNs follow the message passing scheme that updates each node's feature using its neighborhoods of nodes to capture specific patterns of a graph.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Graph Neural Network. Graph neural network (GNN) (Li and Moens 2022;Scarselli et al 2008;Li et al 2019;Gao et al 2020;Zhu et al 2020) is a highly effective framework for representing graph-structured data. GNNs follow the message passing scheme that updates each node's feature using its neighborhoods of nodes to capture specific patterns of a graph.…”
Section: Related Workmentioning
confidence: 99%
“…GNNs follow the message passing scheme that updates each node's feature using its neighborhoods of nodes to capture specific patterns of a graph. Some encouraging works (Li and Moens 2022;Li et al 2019;Gao et al 2020;Zhu et al 2020) study graph neural networks to solve the VQA task. For example, ReGAT (Li et al 2019) represents the image as a graph and captures interactions between objects through the graph attention mechanism.…”
Section: Related Workmentioning
confidence: 99%
“…Several benchmark datasets [32,42,48,49], including complex reasoning questions, facilitate the development of this field. To incorporate with external knowledge, early methods turned to textual Knowledge Bases (KBs) and applied either graphbased [24,36,60,61] or transformer-based approaches [11,13] to introduce the KB information into the question answering module. Besides, multi-modal KBs are also leveraged to solve VQA tasks.…”
Section: Related Work 21 Vqa Tasksmentioning
confidence: 99%