2024
DOI: 10.3390/s24061796
|View full text |Cite
|
Sign up to set email alerts
|

Insights into Object Semantics: Leveraging Transformer Networks for Advanced Image Captioning

Deema Abdal Hafeth,
Stefanos Kollias

Abstract: Image captioning is a technique used to generate descriptive captions for images. Typically, it involves employing a Convolutional Neural Network (CNN) as the encoder to extract visual features, and a decoder model, often based on Recurrent Neural Networks (RNNs), to generate the captions. Recently, the encoder–decoder architecture has witnessed the widespread adoption of the self-attention mechanism. However, this approach faces certain challenges that require further research. One such challenge is that the … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Publication Types

Select...

Relationship

0
0

Authors

Journals

citations
Cited by 0 publications
references
References 54 publications
0
0
0
Order By: Relevance

No citations

Set email alert for when this publication receives citations?