2023
DOI: 10.48550/arxiv.2301.04883
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

SlideVQA: A Dataset for Document Visual Question Answering on Multiple Images

Abstract: Visual question answering on document images that contain textual, visual, and layout information, called document VQA, has received much attention recently. Although many datasets have been proposed for developing document VQA systems, most of the existing datasets focus on understanding the content relationships within a single image and not across multiple images. In this study, we propose a new multiimage document VQA dataset, SlideVQA, containing 2.6k+ slide decks composed of 52k+ slide images and 14.5k q… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Publication Types

Select...

Relationship

0
0

Authors

Journals

citations
Cited by 0 publications
references
References 34 publications
0
0
0
Order By: Relevance

No citations

Set email alert for when this publication receives citations?