2019 International Conference on Document Analysis and Recognition (ICDAR) 2019
DOI: 10.1109/icdar.2019.00251
|View full text |Cite
|
Sign up to set email alerts
|

ICDAR 2019 Competition on Scene Text Visual Question Answering

Abstract: This paper presents final results of ICDAR 2019 Scene Text Visual Question Answering competition (ST-VQA). ST-VQA introduces an important aspect that is not addressed by any Visual Question Answering system up to date, namely the incorporation of scene text to answer questions asked about an image. The competition introduces a new dataset comprising 23, 038 images annotated with 31, 791 question / answer pairs where the answer is always grounded on text instances present in the image. The images are taken from… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
16
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 34 publications
(22 citation statements)
references
References 23 publications
0
16
0
Order By: Relevance
“…If not observe carefully, it's rather easy to obtain the wrong answer 2 instead of 3. The reasons for this error include object occlusion, near and far degrees, and the limitation (Biten et al, 2019), which requires to recognize the numbers, symbols and proper nouns in a scene. In Figure 7(c), subjective judgment is needed to answer the question is this man happy.…”
Section: Qualitative Analysismentioning
confidence: 99%
See 1 more Smart Citation
“…If not observe carefully, it's rather easy to obtain the wrong answer 2 instead of 3. The reasons for this error include object occlusion, near and far degrees, and the limitation (Biten et al, 2019), which requires to recognize the numbers, symbols and proper nouns in a scene. In Figure 7(c), subjective judgment is needed to answer the question is this man happy.…”
Section: Qualitative Analysismentioning
confidence: 99%
“…In Figure 7(b), the question what time should you pay can be answered by recognizing the text semantic understanding in the image. Text semantic understanding belongs to another task, namely text visual question answering(Biten et al, 2019), which requires to recognize the numbers, symbols and proper nouns in a scene. In Figure7(c), subjective judgment is needed to answer the question is this man happy.…”
mentioning
confidence: 99%
“…Scene text image recognition aims to recognize the text characters from the input image, which is an important computer vision task that involves text information processing. It has been widely used in text retrieval [25], sign recognition [17], license plate recognition [35] and other scene-text-based image understanding tasks [6,34]. However, due to the various issues such as low sensor resolution, blurring, poor illumination, etc., the quality of captured scene text images may not be good enough, which brings many difficulties to scene text recognition in practice.…”
Section: Introductionmentioning
confidence: 99%
“…In recent years, research topics around scene text have been very active [37,43,47]. Scene text-related research plays a very important role in many computer vision tasks [3,48]. However, imperfect imaging conditions often hinder the progress of these fields.…”
Section: Introductionmentioning
confidence: 99%