Document Collection Visual Question Answering

Tito, Rubèn; Karatzas, Dìmosthenis; Valveny, Ernest

doi:10.48550/arxiv.2104.14336

Cited by 1 publication

(3 citation statements)

References 27 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Document Intelligence can be considered as an umbrella term covering problems of Key Information Extraction [10,54], Table Detection [41,38] and Structure Recognition [39,55], Document Layout Segmentation [5,4] Document Layout Generation [6,36,3,48], Document Visual Question Answering [51,50,32], Document Image Enhancement [49,22,47] which involves the understanding of visually rich semantic information and structure of different layout entities of a whole page.…”

Section: Related Workmentioning

confidence: 99%

“…Secondly, we decide to use a commercial OCR engine, specifically Amazon Textract 3 , over Tesseract. It is because the performance of the OCR engines can significantly affect the model's performance which can be seen in fields that use OCR annotations, such as in fine-grained classification [29,30,31], in scenetext visual question answering [9,44,8,13], in document visual question answering (DocVQA) [50,33]. Apart from improving the annotation quality significantly, we want to level the differences between research groups and companies.…”

Section: Introductionmentioning

confidence: 99%

“…IDL is a digital archive of documents created by industries which influence public health, hosted by the University of California, San Francisco Library 4 . IDL has already been used in the literature for building datasets: IIT-CDIP [25], RVL-CDIP [18], DocVQA [50,33]. Hence, our OCR annotations can be used to further advance in these tasks.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

OCR-IDL: OCR Annotations for Industry Document Library Dataset

Biten¹,

Tito²,

Gómez³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

Pretraining has proven successful in Document Intelligence tasks where deluge of documents are used to pretrain the models only later to be finetuned on downstream tasks. One of the problems of the pretraining approaches is the inconsistent usage of pretraining data with different OCR engines leading to incomparable results between models. In other words, it is not obvious whether the performance gain is coming from diverse usage of amount of data and distinct OCR engines or from the proposed models. To remedy the problem, we make public the OCR annotations for IDL documents using commercial OCR engine given their superior performance over open source OCR models. The contributed dataset (OCR-IDL) has an estimated monetary value over 20K US$. It is our hope that OCR-IDL can be a starting point for future works on Document Intelligence. All of our data and its collection process with the annotations can be found in https://github.com/furkanbiten/idl_data.

show abstract