2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2022
DOI: 10.1109/wacv51458.2022.00264
|View full text |Cite
|
Sign up to set email alerts
|

InfographicVQA

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
27
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
5
5

Relationship

1
9

Authors

Journals

citations
Cited by 50 publications
(38 citation statements)
references
References 19 publications
0
27
0
Order By: Relevance
“…BROS was proposed with an effective pretraining method (i.e., area masking) and a relative positional encoding trick. To validate the effectiveness of Webvicob-generated data, we pretrain BROS and measure performance on DocVQA Task 1 (Tito et al, 2021) and Task 3 (Mathew et al, 2022).…”
Section: Comparison Methodsmentioning
confidence: 99%
“…BROS was proposed with an effective pretraining method (i.e., area masking) and a relative positional encoding trick. To validate the effectiveness of Webvicob-generated data, we pretrain BROS and measure performance on DocVQA Task 1 (Tito et al, 2021) and Task 3 (Mathew et al, 2022).…”
Section: Comparison Methodsmentioning
confidence: 99%
“…Document Intelligence can be considered as an umbrella term covering problems of Key Information Extraction [10,54], Table Detection [41,38] and Structure Recognition [39,55], Document Layout Segmentation [5,4] Document Layout Generation [6,36,3,48], Document Visual Question Answering [51,50,32], Document Image Enhancement [49,22,47] which involves the understanding of visually rich semantic information and structure of different layout entities of a whole page.…”
Section: Related Workmentioning
confidence: 99%
“…VQA for documents first appears in the DocVQA dataset (Mathew et al, 2021b), which contains more than 12,000 documents and corresponding 5,000 questions. Later, InfographicVQA (Mathew et al, 2021a) is also proposed, which is a VQA benchmark for infographic images in the documents.…”
Section: Visual Information Extractionmentioning
confidence: 99%