2019 International Conference on Document Analysis and Recognition (ICDAR) 2019
DOI: 10.1109/icdar.2019.00156
|View full text |Cite
|
Sign up to set email alerts
|

OCR-VQA: Visual Question Answering by Reading Text in Images

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
79
0

Year Published

2020
2020
2021
2021

Publication Types

Select...
5
4

Relationship

1
8

Authors

Journals

citations
Cited by 100 publications
(80 citation statements)
references
References 16 publications
0
79
0
Order By: Relevance
“…3) We show that the predicted bounding box can provide evidence for analyzing network behavior in addition to improving the performance. 4) Our proposed LaAP-Net outperforms state-of-the-art approaches on three benchmark text VQA datasets, TextVQA , ST-VQA(Biten et al, 2019b) and OCR-VQA (Mishra et al, 2019), by a noticeable margin.…”
Section: Introductionmentioning
confidence: 84%
“…3) We show that the predicted bounding box can provide evidence for analyzing network behavior in addition to improving the performance. 4) Our proposed LaAP-Net outperforms state-of-the-art approaches on three benchmark text VQA datasets, TextVQA , ST-VQA(Biten et al, 2019b) and OCR-VQA (Mishra et al, 2019), by a noticeable margin.…”
Section: Introductionmentioning
confidence: 84%
“…How to leverage information from text tokens, how to understand relationships between text tokens and visual objects or between different tokens, how to predict a text token with language models are still problems that need to be explored. [24] propose to extract text blocks before conducting optical character recognition. The block features are then combined with image features and question features to predict the final answer.…”
Section: Related Workmentioning
confidence: 99%
“…11%) [4] can be attributed to the straightforward architecture used for their authors, more analysis are required to determine the convenience of using this n-gram representation for the answer space. As this task is attracting attention, recent works present the task by introducing new databases, [10] introduces a new database, OCR-VQA-200K comprising images of bookcovers, [12] introduces a database containing images of business brands, movie posters and book covers.…”
Section: Related Workmentioning
confidence: 99%