2020
DOI: 10.1007/978-3-030-58536-5_44
|View full text |Cite
|
Sign up to set email alerts
|

TextCaps: A Dataset for Image Captioning with Reading Comprehension

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
112
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
4
1

Relationship

0
9

Authors

Journals

citations
Cited by 141 publications
(146 citation statements)
references
References 25 publications
1
112
0
Order By: Relevance
“…Dognin et al (2020) recently discussed their winning entry to the VizWiz Grand Challenge. In addition, Sidorov et al (2020) introduced a model that has shown to gain significant performance improvement by using OCR tokens. We intend to compare our model with these and improve our work based on the observations made.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…Dognin et al (2020) recently discussed their winning entry to the VizWiz Grand Challenge. In addition, Sidorov et al (2020) introduced a model that has shown to gain significant performance improvement by using OCR tokens. We intend to compare our model with these and improve our work based on the observations made.…”
Section: Discussionmentioning
confidence: 99%
“…It has also been used in image captioning to aid learning novel objects (Yao et al, 2017;Li et al, 2019). Also, Sidorov et al (2020) introduced an M4C model that recognizes text, relates it to its visual context, and decides what part of the text to copy or paraphrase, requiring spatial, semantic, and visual reasoning between multiple text tokens and visual entities such as objects.…”
Section: Createdmentioning
confidence: 99%
“…(3) Text-based reading comprehension. TextCaps [49] and text-based VQA [50,3] show the new vision-and-language tasks, which need to recognize text, relate it to its visual context, semantic, and visual reasoning between multiple text tokens and visual entities, such as objects. Similarly, there are many application demands for video text understanding across various industries and in our daily lives.…”
Section: Link To Other Video-and-language Applicationsmentioning
confidence: 99%
“…The key interest of this dataset is detecting and annotating text generation errors from PLMs. Therefore it is different from conventional text generation datasets (e.g., Multi-News (Fabbri et al, 2019), TextCaps (Sidorov et al, 2020)) that are constructed to train models to learn text generation (e.g., generating texts from images or long documents). It is also different from grammatical error correction (GEC) datasets (Zhao et al, 2018;Flachs et al, 2020) that are built from human-written texts usually by second language learners.…”
Section: Introductionmentioning
confidence: 99%