2023
DOI: 10.1371/journal.pone.0290315
|View full text |Cite
|
Sign up to set email alerts
|

Image to English translation and comprehension: INT2-VQA method based on inter-modality and intra-modality collaborations

Xianli Sheng

Abstract: Existing visual question answering methods typically concentrate only on visual targets in images, ignoring the key textual content in the images, thereby limiting the depth and accuracy of image content comprehension. Inspired by this, we pay attention to the task of text-based visual question answering, address the performance bottleneck issue caused by over-fitting risk in existing self-attention-based models, and propose a scenario text visual question answering method called INT2-VQA that fuses knowledge … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
references
References 61 publications
0
0
0
Order By: Relevance