2021
DOI: 10.1109/tmm.2020.2976552
|View full text |Cite
|
Sign up to set email alerts
|

Integrating Part of Speech Guidance for Image Captioning

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
7
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 40 publications
(7 citation statements)
references
References 42 publications
0
7
0
Order By: Relevance
“…Image captioning provides a variety of approaches that link the visual contents with normal language, e.g., explaining images with textual descriptions [11,12]. In the existing literature, artificial neural network-based models were utilized to encode visual information with pre trained classification networks such as CNN and RNN [13].…”
Section: Introductionmentioning
confidence: 99%
“…Image captioning provides a variety of approaches that link the visual contents with normal language, e.g., explaining images with textual descriptions [11,12]. In the existing literature, artificial neural network-based models were utilized to encode visual information with pre trained classification networks such as CNN and RNN [13].…”
Section: Introductionmentioning
confidence: 99%
“…While this method was the first attempt toward integrating the POS to emphasize the images, the words with multiple POSs were not considered because a pre-made POS dictionary was used. Zhang et al [11] proposed the POS Guidance module, a method to use POS as a guide in image captioning. They proposed two models using POS as a guide for the injectbased method and the merge-based method, which are the most used among the image captioning methods analyzed by Tanti et al [45].…”
Section: Theoretical Backgroundmentioning
confidence: 99%
“…Most studies in the image caption area apply the encoderdecoder framework, which consists of an encoder that extracts features from an image and a decoder that generates sentences due to the development of deep learning. Unlike conventional methods [6]- [9], this structure can create various captions from scenes without using a fixed sentence template [11]- [14]. This description has a more unconstrained structure than before, detailing the context.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Image captioning [1], [2], [3] aims at describing the content and event of an image using a couple of words. We can Fig.…”
Section: Introductionmentioning
confidence: 99%