2022
DOI: 10.1049/ipr2.12470
|View full text |Cite
|
Sign up to set email alerts
|

Variational joint self‐attention for image captioning

Abstract: The image captioning task has attracted great attention from many researchers, and significant progress has been made in the past few years. Existing image captioning models, which mainly apply attention‐based encoder‐decoder architecture, achieve great developments image captioning. These attention‐based models, however, are limited in the caption generation due to the potential errors resulting from the inaccurate detection of objects and incorrect attention to the objects. To alleviate the limitation, a Var… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
1
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(2 citation statements)
references
References 29 publications
(72 reference statements)
0
1
0
Order By: Relevance
“…The attention module is mainly used in tasks where context information is important, such as visual question answering (VQA), image captioning, and scene character recognition [33,34]. However, when the concept of attention was expanded to self-attention, it began to be used in CNN.…”
Section: Attention Modulementioning
confidence: 99%
“…The attention module is mainly used in tasks where context information is important, such as visual question answering (VQA), image captioning, and scene character recognition [33,34]. However, when the concept of attention was expanded to self-attention, it began to be used in CNN.…”
Section: Attention Modulementioning
confidence: 99%
“…These models require giant data sets, long training time, and high hardware requirements, which can only be satisfied in laboratories. Therefore, many researchers use selfattention to improve the fully convolutional models, slightly increasing the computational complexity of the model to obtain better detection results [27][28][29][30]. In 2022, Guo proposed visual attention network (VAN) [18], which adds Self-attention to the convolutional layer to form the VAN module, significantly improving the performance of the fully convolutional network.…”
Section: Introductionmentioning
confidence: 99%