2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2016
DOI: 10.1109/cvpr.2016.503
|View full text |Cite
|
Sign up to set email alerts
|

Image Captioning with Semantic Attention

Abstract: Automatically generating a natural language description of an image has attracted interests recently both because of its importance in practical applications and because it connects two major artificial intelligence fields: computer vision and natural language processing. Existing approaches are either top-down, which start from a gist of an image and convert it into words, or bottom-up, which come up with words describing various aspects of an image and then combine them. In this paper, we propose a new algor… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

2
917
0
1

Year Published

2017
2017
2023
2023

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 1,457 publications
(920 citation statements)
references
References 26 publications
2
917
0
1
Order By: Relevance
“…In recent years, much work has been published on image captioning, including [3,4,9,12,20,22,28,31,33], to name a few. Many proposed captioning models exploit RNN-based decoders to generate a sequence of words from encoded representation of input images.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…In recent years, much work has been published on image captioning, including [3,4,9,12,20,22,28,31,33], to name a few. Many proposed captioning models exploit RNN-based decoders to generate a sequence of words from encoded representation of input images.…”
Section: Related Workmentioning
confidence: 99%
“…Most captioning models are equipped with RNN-based encoders (e.g. [3,22,25,28,31,33]), which predict a word at every time step, based on only a current input and a single or a few hidden states as an implicit summary of all previous history. Thus, RNNs and their variants often fails to capture longterm dependencies, which could worsen if one wants to use prior knowledge together.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…CNN has proven to be successful in processing imagelike data, while RNN is more appropriate in modeling sequential data. Recently, several works [8,23,44,48,52,54] have attempted to combine them together, and have built various CNN-RNN frameworks. Generally, the combination can be divided in two types: the unified combination and the cascaded combination.…”
Section: Usage Of Cnn-rnn Frameworkmentioning
confidence: 99%
“…The cascaded CNN-RNN frameworks are often intended for different tasks, rather than image classification. For example, [8,45,52] employed CNN-RNN to address the image captioning task, and [50] utilized CNN-RNN to rank the tag list based on the visual importance.…”
Section: Usage Of Cnn-rnn Frameworkmentioning
confidence: 99%