2022
DOI: 10.3390/app12136733
|View full text |Cite
|
Sign up to set email alerts
|

Dual-Modal Transformer with Enhanced Inter- and Intra-Modality Interactions for Image Captioning

Abstract: Image captioning is oriented towards describing an image with the best possible use of words that can provide a semantic, relatable meaning of the scenario inscribed. Different models can be used to accomplish this arduous task depending on the context and requirement of what needs to be achieved. An encoder–decoder model which uses the image feature vectors as an input to the encoder is often marked as one of the appropriate models to accomplish the captioning process. In the proposed work, a dual-modal trans… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
1
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 7 publications
(2 citation statements)
references
References 45 publications
(46 reference statements)
0
1
0
Order By: Relevance
“…The transformer is able to avoid any duplication by employing the attention in a comprehensive way between the input and the output. Extensive approaches were proposed to employ the transformer models in image captioning [74][75][76][77][78][79][80][81][82][83][84][85].…”
Section: ) Transformer-basedmentioning
confidence: 99%
“…The transformer is able to avoid any duplication by employing the attention in a comprehensive way between the input and the output. Extensive approaches were proposed to employ the transformer models in image captioning [74][75][76][77][78][79][80][81][82][83][84][85].…”
Section: ) Transformer-basedmentioning
confidence: 99%
“…Kumar et al [61] extracted the feature vectors and detected objects in the image. Then, using the feature, vector embedding and object embedding are created, respectively.…”
Section: Grid Featuresmentioning
confidence: 99%