2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020
DOI: 10.1109/cvpr42600.2020.01001
|View full text |Cite
|
Sign up to set email alerts
|

Iterative Answer Prediction With Pointer-Augmented Multimodal Transformers for TextVQA

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
168
0

Year Published

2020
2020
2021
2021

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 148 publications
(168 citation statements)
references
References 28 publications
0
168
0
Order By: Relevance
“…However, this common embedding space has difficulty utilizing the image object features. We observe this by training the M4C (Hu et al, 2020) network without the image object modality. The accuracy is almost unaffected.…”
Section: Context-enriched Ocr Representationmentioning
confidence: 94%
See 4 more Smart Citations
“…However, this common embedding space has difficulty utilizing the image object features. We observe this by training the M4C (Hu et al, 2020) network without the image object modality. The accuracy is almost unaffected.…”
Section: Context-enriched Ocr Representationmentioning
confidence: 94%
“…The generated answer could be selected from a fixed answer vocabulary or one of the OCR tokens by the copy module. The copy module is further improved by M4C (Hu et al, 2020) using dynamic pointer network. The M4C also proposes a transformer based network with 3 multi-modal input (question, image object features and OCR features).…”
Section: Text Visual Question Answeringmentioning
confidence: 99%
See 3 more Smart Citations