2018
DOI: 10.1007/978-3-030-01249-6_32
|View full text |Cite
|
Sign up to set email alerts
|

“Factual” or “Emotional”: Stylized Image Captioning with Adaptive Learning and Attention

Abstract: Generating stylized captions for an image is an emerging topic in image captioning. Given an image as input, it requires the system to generate a caption that has a specific style (e.g., humorous, romantic, positive, and negative) while describing the image content semantically accurately. In this paper, we propose a novel stylized image captioning model that effectively takes both requirements into consideration. To this end, we first devise a new variant of LSTM, named style-factual LSTM, as the building blo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
49
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 80 publications
(49 citation statements)
references
References 27 publications
0
49
0
Order By: Relevance
“…For the task of autogenerating factual and functional captions for drug paraphernalia, there is much room for future exploration from an algorithmic perspective. Some recent image captioning studies [26,21,31] have constructed variant LSTM language models to learn factual and non-factual knowledge in corpora. Some studies [32,21,33,34] have allowed for learning non-factual knowledge in unpaired corpora via weakly supervised or unsupervised methods.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…For the task of autogenerating factual and functional captions for drug paraphernalia, there is much room for future exploration from an algorithmic perspective. Some recent image captioning studies [26,21,31] have constructed variant LSTM language models to learn factual and non-factual knowledge in corpora. Some studies [32,21,33,34] have allowed for learning non-factual knowledge in unpaired corpora via weakly supervised or unsupervised methods.…”
Section: Discussionmentioning
confidence: 99%
“…Gan et al [21] designed a model called StyleNet, in which the weight matrices in LSTM networks are decomposed into several factors that are used to generate factual and stylized captions. Chen et al [31] proposed a variant version of LSTM called Style-Factual LSTM. In this model, two groups of matrices are trained to capture factual and stylized information, respectively.…”
Section: Related Work a Image Captioning Datasetsmentioning
confidence: 99%
“…After that, some methods [63,1,35] tried to integrate the vanilla CNN-RNN architecture with neural attention mechanisms, like semantic attention [35], and bottom-up/top-down attention [1], to name a few representative ones. Another popular trend [15,47,24,5,42,37,6] in this area focuses on improving the discriminability of caption generations, such as stylized image captioning [15,6], personalized image captioning [47], and context-aware image captioning [24,5].…”
Section: Related Workmentioning
confidence: 99%
“…The authors declare no conflict of interest. [20] x x Visual Genome Dataset [21] x x x 19.9 13.7 13.1 [22] x x x x [23] x [35] x x Recall Evaluation metric [36] x x OI, VG, VRD [37] x x X X 71.6 51.8 37.1 26.5 24.3 [38] x x APRC, CSMC [39] x x x x F-1 score metrics 21.6 [45] x x x x IAPRTC-12 [46] x x x x [47] x x x x x R [48] x…”
Section: Conflicts Of Interestmentioning
confidence: 99%