2021
DOI: 10.48550/arxiv.2104.14336
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Document Collection Visual Question Answering

Abstract: Current tasks and methods in Document Understanding aims to process documents as single elements. However, documents are usually organized in collections (historical records, purchase invoices), that provide context useful for their interpretation. To address this problem, we introduce Document Collection Visual Question Answering (DocCVQA) a new dataset and related task, where questions are posed over a whole collection of document images and the goal is not only to provide the answer to the given question, b… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

1
0

Authors

Journals

citations
Cited by 1 publication
(3 citation statements)
references
References 27 publications
0
3
0
Order By: Relevance
“…Document Intelligence can be considered as an umbrella term covering problems of Key Information Extraction [10,54], Table Detection [41,38] and Structure Recognition [39,55], Document Layout Segmentation [5,4] Document Layout Generation [6,36,3,48], Document Visual Question Answering [51,50,32], Document Image Enhancement [49,22,47] which involves the understanding of visually rich semantic information and structure of different layout entities of a whole page.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…Document Intelligence can be considered as an umbrella term covering problems of Key Information Extraction [10,54], Table Detection [41,38] and Structure Recognition [39,55], Document Layout Segmentation [5,4] Document Layout Generation [6,36,3,48], Document Visual Question Answering [51,50,32], Document Image Enhancement [49,22,47] which involves the understanding of visually rich semantic information and structure of different layout entities of a whole page.…”
Section: Related Workmentioning
confidence: 99%
“…Secondly, we decide to use a commercial OCR engine, specifically Amazon Textract 3 , over Tesseract. It is because the performance of the OCR engines can significantly affect the model's performance which can be seen in fields that use OCR annotations, such as in fine-grained classification [29,30,31], in scenetext visual question answering [9,44,8,13], in document visual question answering (DocVQA) [50,33]. Apart from improving the annotation quality significantly, we want to level the differences between research groups and companies.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation