“Factual” or “Emotional”: Stylized Image Captioning with Adaptive Learning and Attention

Chen, Tianlang; Zhang, Zhongping; You, Quanzeng; Chen, Fang; Wang, Zhaowen; Jin, Hailin; Luo, Jiebo

doi:10.1007/978-3-030-01249-6_32

Cited by 82 publications

(51 citation statements)

References 27 publications

(26 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For the task of autogenerating factual and functional captions for drug paraphernalia, there is much room for future exploration from an algorithmic perspective. Some recent image captioning studies [26,21,31] have constructed variant LSTM language models to learn factual and non-factual knowledge in corpora. Some studies [32,21,33,34] have allowed for learning non-factual knowledge in unpaired corpora via weakly supervised or unsupervised methods.…”

Section: Discussionmentioning

confidence: 99%

“…Gan et al [21] designed a model called StyleNet, in which the weight matrices in LSTM networks are decomposed into several factors that are used to generate factual and stylized captions. Chen et al [31] proposed a variant version of LSTM called Style-Factual LSTM. In this model, two groups of matrices are trained to capture factual and stylized information, respectively.…”

Section: Related Work a Image Captioning Datasetsmentioning

confidence: 99%

See 1 more Smart Citation

DrunaliaCap: Image Captioning for Drug-Related Paraphernalia With Deep Learning

Zhao

2020

IEEE Access

View full text Add to dashboard Cite

Image captioning is a process of generating textual descriptions of images. In recent years, research on publicly available large-scale datasets and deep learning-based algorithms has promoted the development of this field. However, little research has been conducted on captioning images of drug-related paraphernalia that, despite being an important topic for both drug prevention and police enforcement, is not covered by existing image captioning studies. In this paper, we propose DrunaliaCap-a deep learningbased system for autogenerating both "factual" (what is in the image) and "functional" (the usage of each paraphernalia during drug-taking) descriptions of images of drug-related paraphernalia. We constructed a new dataset containing 20 categories of drug-related items and trained deep learning-based models for the proposed system. We further proposed a method to evaluate and optimize the generation of captions to prevent them from missing important knowledge. Experiments were conducted to validate the performance of the newly proposed dataset and method. We analyzed the experimental results and discussed the significance, limitations, and potential applications of our work. INDEX TERMS image captioning, drug prevention, dataset construction, deep learning VOLUME 4, 2016 This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Related Work a Image Captioning Datasetsmentioning

confidence: 99%

DrunaliaCap: Image Captioning for Drug-Related Paraphernalia With Deep Learning

Zhao

2020

IEEE Access

View full text Add to dashboard Cite

show abstract

“…After that, some methods [63,1,35] tried to integrate the vanilla CNN-RNN architecture with neural attention mechanisms, like semantic attention [35], and bottom-up/top-down attention [1], to name a few representative ones. Another popular trend [15,47,24,5,42,37,6] in this area focuses on improving the discriminability of caption generations, such as stylized image captioning [15,6], personalized image captioning [47], and context-aware image captioning [24,5].…”

Section: Related Workmentioning

confidence: 99%

Reasoning Visual Dialogs With Structural and Partial Observations

Zheng

Yang

et al. 2019

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

116

View full text Add to dashboard Cite

We propose a novel model to address the task of Visual Dialog which exhibits complex dialog structures. To obtain a reasonable answer based on the current question and the dialog history, the underlying semantic dependencies between dialog entities are essential. In this paper, we explicitly formalize this task as inference in a graphical model with partially observed nodes and unknown graph structures (relations in dialog). The given dialog entities are viewed as the observed nodes. The answer to a given question is represented by a node with missing value. We first introduce an Expectation Maximization algorithm to infer both the underlying dialog structures and the missing node values (desired answers). Based on this, we proceed to propose a differentiable graph neural network (GNN) solution that approximates this process. Experiment results on the VisDial and VisDial-Q datasets show that our model outperforms comparative methods. It is also observed that our method can infer the underlying dialog structure for better dialog reasoning.

show abstract

“…The authors declare no conflict of interest. [20] x x Visual Genome Dataset [21] x x x 19.9 13.7 13.1 [22] x x x x [23] x [35] x x Recall Evaluation metric [36] x x OI, VG, VRD [37] x x X X 71.6 51.8 37.1 26.5 24.3 [38] x x APRC, CSMC [39] x x x x F-1 score metrics 21.6 [45] x x x x IAPRTC-12 [46] x x x x [47] x x x x x R [48] x…”

Section: Conflicts Of Interestmentioning

confidence: 99%

A Systematic Literature Review on Image Captioning

Staniūtė

Šešok

2019

Applied Sciences

View full text Add to dashboard Cite

Natural language problems have already been investigated for around five years. Recent progress in artificial intelligence (AI) has greatly improved the performance of models. However, the results are still not sufficiently satisfying. Machines cannot imitate human brains and the way they communicate, so it remains an ongoing task. Due to the increasing amount of information on this topic, it is very difficult to keep on track with the newest researches and results achieved in the image captioning field. In this study a comprehensive Systematic Literature Review (SLR) provides a brief overview of improvements in image captioning over the last four years. The main focus of the paper is to explain the most common techniques and the biggest challenges in image captioning and to summarize the results from the newest papers. Inconsistent comparison of results achieved in image captioning was noticed during this study and hence the awareness of incomplete data collection is raised in this paper. Therefore, it is very important to compare results of a newly created model produced with the newest information and not only with the state of the art methods. This SLR is a source of such information for researchers in order for them to be precisely correct on result comparison before publishing new achievements in the image caption generation field.

show abstract

“Factual” or “Emotional”: Stylized Image Captioning with Adaptive Learning and Attention

Cited by 82 publications

References 27 publications

DrunaliaCap: Image Captioning for Drug-Related Paraphernalia With Deep Learning

DrunaliaCap: Image Captioning for Drug-Related Paraphernalia With Deep Learning

Reasoning Visual Dialogs With Structural and Partial Observations

A Systematic Literature Review on Image Captioning

Contact Info

Product

Resources

About