Findings of the Association for Computational Linguistics: EMNLP 2023 2023
DOI: 10.18653/v1/2023.findings-emnlp.437
|View full text |Cite
|
Sign up to set email alerts
|

MM-Reasoner: A Multi-Modal Knowledge-Aware Framework for Knowledge-Based Visual Question Answering

Mahmoud Khademi,
Ziyi Yang,
Felipe Frujeri
et al.

Abstract: Thanks to the strong reasoning capabilities of Large Language Models (LLMs), recent approaches to knowledge-based visual question answering (KVQA) utilize LLMs with a global caption of an input image to answer a question. However, these approaches may miss key visual information that is not captured by the caption. Moreover, they cannot fully utilize the visual information required to answer the question. To address these issues, we introduce a new framework called Multi-Modal Knowledge-Aware Reasoner (MM-Reas… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Publication Types

Select...

Relationship

0
0

Authors

Journals

citations
Cited by 0 publications
references
References 14 publications
(27 reference statements)
0
0
0
Order By: Relevance

No citations

Set email alert for when this publication receives citations?