2023
DOI: 10.1609/aaai.v37i11.26598
|View full text |Cite
|
Sign up to set email alerts
|

SlideVQA: A Dataset for Document Visual Question Answering on Multiple Images

Abstract: Visual question answering on document images that contain textual, visual, and layout information, called document VQA, has received much attention recently. Although many datasets have been proposed for developing document VQA systems, most of the existing datasets focus on understanding the content relationships within a single image and not across multiple images. In this study, we propose a new multi-image document VQA dataset, SlideVQA, containing 2.6k+ slide decks composed of 52k+ slide images and 14.5k … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(1 citation statement)
references
References 32 publications
(44 reference statements)
0
1
0
Order By: Relevance
“…• Multi-page QA w/ Multi-hop & Discrete & Visual Reasoning requires understanding the content relationship via multi-hop reasoning as well as discrete/visual reasoning on multi-page documents (Tanaka et al 2023;Landeghem et al 2023).…”
Section: Dataset Collectionmentioning
confidence: 99%
“…• Multi-page QA w/ Multi-hop & Discrete & Visual Reasoning requires understanding the content relationship via multi-hop reasoning as well as discrete/visual reasoning on multi-page documents (Tanaka et al 2023;Landeghem et al 2023).…”
Section: Dataset Collectionmentioning
confidence: 99%