2020
DOI: 10.1109/access.2020.2999568
|View full text |Cite
|
Sign up to set email alerts
|

Cross-Lingual Image Caption Generation Based on Visual Attention Model

Abstract: As an interesting and challenging problem, generating image caption automatically has attracted increasingly attention in natural language processing and computer vision communities. In this paper, we propose an end-to-end deep learning approach for image caption generation. We leverage image feature information at specific location every moment and generate the corresponding caption description through a semantic attention model. The end-to-end framework allows us to introduce an independent recurrent structu… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
8
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
6
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 17 publications
(8 citation statements)
references
References 47 publications
(69 reference statements)
0
8
0
Order By: Relevance
“…Finally, visual features and relation features were integrated into some benchmark captioning models for evaluation. Moreover, several studies explored cross-lingual aspects [25,26,27]. They tried to build adaptive captioning models that could work well with multilanguages instead of only specific ones.…”
Section: ) Previous Approachesmentioning
confidence: 99%
“…Finally, visual features and relation features were integrated into some benchmark captioning models for evaluation. Moreover, several studies explored cross-lingual aspects [25,26,27]. They tried to build adaptive captioning models that could work well with multilanguages instead of only specific ones.…”
Section: ) Previous Approachesmentioning
confidence: 99%
“…The attention mechanism has been widely used in tasks such as natural language description [33], machine translation [34], image feature extraction [35,36], and image classification [37,38]. In essence, the attention mechanism is a weight probability distribution mechanism that assigns larger weights to important content and smaller weights to other content.…”
Section: Ph(i) = Ph Imentioning
confidence: 99%
“…Additionally, Lu et al (2018) study offered a unique model based on Visual Attention that not only gives a greater visual comprehension of the model's judgments, but also greatly outperforms previous state-of-the-art baseline techniques for this job. In another study, Wang et al (2020) study evaluated their proposed model on the most popular benchmark datasets. We report an improvement of 3.9% over existing state-of-the-art approaches for cross-lingual image captioning on the Flickr8k CN dataset on CIDEr metric.…”
Section: Consumers' Attentionmentioning
confidence: 99%