Cross-Lingual Image Caption Generation Based on Visual Attention Model

Wang, Bin; Wang, Cungang; Zhang, Qian; Su, Ying; Wang, Yang; Xu, Yanyan

doi:10.1109/access.2020.2999568

Cited by 17 publications

(8 citation statements)

References 47 publications

(69 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Finally, visual features and relation features were integrated into some benchmark captioning models for evaluation. Moreover, several studies explored cross-lingual aspects [25,26,27]. They tried to build adaptive captioning models that could work well with multilanguages instead of only specific ones.…”

Section: ) Previous Approachesmentioning

confidence: 99%

EAES: Effective Augmented Embedding Spaces for Text-Based Image Captioning

et al. 2022

View full text Add to dashboard Cite

Text-based Image Captioning has been a novel problem since 2020. This topic remains challenging because it requires the model to comprehend not only the visual context but also the scene texts that appear in an image. Therefore, the ways image and scene texts are embedded into the main model for training is crucial. Based on the M4C-Captioner model, this paper proposes the simple but effective EAES embedding module for effectively embedding images and scene texts into the multimodal Transformer layers. In detail, our EAES module contains two significant sub-modules: Objects-augmented and Grid feature augmentation. With the Objects-augmented module, we provide the relative geometry feature, representing the relation between objects and between OCR tokens. Furthermore, we extract the grid feature for an image with the Grid feature augmentation module and combine it with visual objects, which help the model focus on both salient objects and the general context of an image, leading to better performance. We use the TextCaps dataset as the benchmark to prove the effectiveness of our approach on five standard metrics: BLEU4, METEOR, ROUGE-L, SPICE and CIDEr. Without bells and whistles, our method achieves 20.21% on the BLEU4 metric and 85.78% on the CIDEr metric, 1.31% and 4.78% higher, respectively, than the baseline M4C-Captioner method. Furthermore, the results are incredibly competitive with other methods on METEOR, ROUGE-L and SPICE metrics. INDEX TERMS image captioning, text-based image captioning, bottom-up top-down, grid feature, multimodal transformer, m4c

show abstract

Section: ) Previous Approachesmentioning

confidence: 99%

EAES: Effective Augmented Embedding Spaces for Text-Based Image Captioning

et al. 2022

View full text Add to dashboard Cite

show abstract

“…The attention mechanism has been widely used in tasks such as natural language description [33], machine translation [34], image feature extraction [35,36], and image classification [37,38]. In essence, the attention mechanism is a weight probability distribution mechanism that assigns larger weights to important content and smaller weights to other content.…”

Section: Ph(i) = Ph Imentioning

confidence: 99%

AAU-Net: Attention-Based Asymmetric U-Net for Subject-Sensitive Hashing of Remote Sensing Images

Ding

Chen

Wang³

et al. 2021

Remote Sensing

View full text Add to dashboard Cite

The prerequisite for the use of remote sensing images is that their security must be guaranteed. As a special subset of perceptual hashing, subject-sensitive hashing overcomes the shortcomings of the existing perceptual hashing that cannot distinguish between “subject-related tampering” and “subject-unrelated tampering” of remote sensing images. However, the existing subject-sensitive hashing still has a large deficiency in robustness. In this paper, we propose a novel attention-based asymmetric U-Net (AAU-Net) for the subject-sensitive hashing of remote sensing (RS) images. Our AAU-Net demonstrates obvious asymmetric structure characteristics, which is important to improve the robustness of features by combining the attention mechanism and the characteristics of subject-sensitive hashing. On the basis of AAU-Net, a subject-sensitive hashing algorithm is developed to integrate the features of various bands of RS images. Our experimental results show that our AAU-Net-based subject-sensitive hashing algorithm is more robust than the existing deep learning models such as Attention U-Net and MUM-Net, and its tampering sensitivity remains at the same level as that of Attention U-Net and MUM-Net.

show abstract

“…Additionally, Lu et al (2018) study offered a unique model based on Visual Attention that not only gives a greater visual comprehension of the model's judgments, but also greatly outperforms previous state-of-the-art baseline techniques for this job. In another study, Wang et al (2020) study evaluated their proposed model on the most popular benchmark datasets. We report an improvement of 3.9% over existing state-of-the-art approaches for cross-lingual image captioning on the Flickr8k CN dataset on CIDEr metric.…”

Section: Consumers' Attentionmentioning

confidence: 99%

Impact of Strategic Ambiguity Tagline on Billboard Advertising on Consumers Attention

Nwankwo-Ojionu

Adzharuddin

Waheed

et al. 2021

ONLINE J COMMUN MEDIA TECHNOL

View full text Add to dashboard Cite

Using strategic ambiguity tagline paradigm, we demonstrated that the strategic ambiguity tagline influences consumer attention on billboard advertisement evaluation. Despite the exceptional influence of strategic ambiguity tagline on billboard advertisement evaluation, the concept of strategic ambiguity tagline as discursive resources remains poorly conceptualised by previous studies. An experimental study was conducted to investigate the underlying mechanisms and circumstances that influence the impact of strategic ambiguity tagline on consumers attention. Findings revealed that strategic ambiguity tagline has a significant influence on consumers attention. However, we further observed strong effects of attitude towards tagline ads, perception towards tagline ads and brand motives using tagline ads on consumer attention when exposed to the ads, which signify effectiveness of strategic ambiguity tagline on consumer's attention. The theoretical and managerial implication are discussed.

show abstract

Cross-Lingual Image Caption Generation Based on Visual Attention Model

Cited by 17 publications

References 47 publications

EAES: Effective Augmented Embedding Spaces for Text-Based Image Captioning

EAES: Effective Augmented Embedding Spaces for Text-Based Image Captioning

AAU-Net: Attention-Based Asymmetric U-Net for Subject-Sensitive Hashing of Remote Sensing Images

Impact of Strategic Ambiguity Tagline on Billboard Advertising on Consumers Attention

Contact Info

Product

Resources

About