Proceedings of the 24th ACM International Conference on Multimedia 2016
DOI: 10.1145/2964284.2964299
|View full text |Cite
|
Sign up to set email alerts
|

Image Captioning with Deep Bidirectional LSTMs

Abstract: This work presents an end-to-end trainable deep bidirectional LSTM (Long-Short Term Memory) model for image captioning. Our model builds on a deep convolutional neural network (CNN) and two separate LSTM networks. It is capable of learning long term visual-language interactions by making use of history and future context information at high level semantic space. Two novel deep bidirectional variant models, in which we increase the depth of nonlinearity transition in different way, are proposed to learn hierarc… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
115
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
3
3
3

Relationship

0
9

Authors

Journals

citations
Cited by 222 publications
(115 citation statements)
references
References 33 publications
0
115
0
Order By: Relevance
“…Datasets Evaluation Metrics Kiros et al 2014 [69] IAPR TC-12,SBU BLEU, PPLX Kiros et al 2014 [70] Flickr [90] Flickr 8k, UIUC BLEU, R@K You et al 2016 [156] Flickr 30K, MS COCO BLEU, METEOR, ROUGE, CIDEr Yang et al 2016 [153] Visual Genome METEOR, AP, IoU Anne et al 2016 [6] MS COCO, ImageNet BLEU, METEOR Yao et al 2017 [155] MS COCO BLEU, METEOR, ROUGE, CIDEr Lu et al 2017 [88] Flickr 30K, MS COCO BLEU, METEOR, CIDEr Chen et al 2017 [21] Flickr 8K/30K, MS COCO BLEU, METEOR, ROUGE, CIDEr Gan et al 2017 [41] Flickr [85] MS COCO SPIDEr, Human Evaluation Gu et al 2017 [51] Flickr 30K, MS COCO BLEU, METEOR, CIDEr, SPICE Yao et al 2017 [154] MS COCO, ImageNet METEOR Rennie et al 2017 [120] MS COCO BLEU, METEOR, CIDEr, ROUGE Vsub et al 2017 [140] MS COCO, ImageNet METEOR Zhang et al 2017 [161] MS COCO BLEU, METEOR, ROUGE, CIDEr Wu et al 2018 [150] Flickr 8K/30K, MS COCO BLEU, METEOR, CIDEr Aneja et al 2018 [5] MS COCO BLEU, METEOR, ROUGE, CIDEr Wang et al 2018 [147] MS COCO BLEU, METEOR, ROUGE, CIDEr [21,59,61,144,150,152] have performed experiments using the dataset. Two sample results by Jia et al [59] on this dataset are shown in Figure 13 4.1.4 Visual Genome Dataset.…”
Section: Referencementioning
confidence: 99%
“…Datasets Evaluation Metrics Kiros et al 2014 [69] IAPR TC-12,SBU BLEU, PPLX Kiros et al 2014 [70] Flickr [90] Flickr 8k, UIUC BLEU, R@K You et al 2016 [156] Flickr 30K, MS COCO BLEU, METEOR, ROUGE, CIDEr Yang et al 2016 [153] Visual Genome METEOR, AP, IoU Anne et al 2016 [6] MS COCO, ImageNet BLEU, METEOR Yao et al 2017 [155] MS COCO BLEU, METEOR, ROUGE, CIDEr Lu et al 2017 [88] Flickr 30K, MS COCO BLEU, METEOR, CIDEr Chen et al 2017 [21] Flickr 8K/30K, MS COCO BLEU, METEOR, ROUGE, CIDEr Gan et al 2017 [41] Flickr [85] MS COCO SPIDEr, Human Evaluation Gu et al 2017 [51] Flickr 30K, MS COCO BLEU, METEOR, CIDEr, SPICE Yao et al 2017 [154] MS COCO, ImageNet METEOR Rennie et al 2017 [120] MS COCO BLEU, METEOR, CIDEr, ROUGE Vsub et al 2017 [140] MS COCO, ImageNet METEOR Zhang et al 2017 [161] MS COCO BLEU, METEOR, ROUGE, CIDEr Wu et al 2018 [150] Flickr 8K/30K, MS COCO BLEU, METEOR, CIDEr Aneja et al 2018 [5] MS COCO BLEU, METEOR, ROUGE, CIDEr Wang et al 2018 [147] MS COCO BLEU, METEOR, ROUGE, CIDEr [21,59,61,144,150,152] have performed experiments using the dataset. Two sample results by Jia et al [59] on this dataset are shown in Figure 13 4.1.4 Visual Genome Dataset.…”
Section: Referencementioning
confidence: 99%
“…Seq2Seq-f+b: it fills the blanks by both Seq2Seq-f and Seq2Seq-b, and then selects the output with a maximum of the probabilities assigned by the seq2seq models. This method is used in Wang et al (2016). (Berglund et al, 2015) on a well-trained seq2seq model with BiRNN as the decoder to fill the blanks.…”
Section: Baselinesmentioning
confidence: 99%
“…Most existing models generate a word sequence one by one in a front-to-back manner, without considering the influence of the subsequent words on the whole sentence generation. Bidirectional LSTMs have been developed to generate sentences from two directions [38,39] independently. Essentially, it is the same way as before since the forward and backward LSTMs are still trained without interaction.…”
Section: Phased Trainable Modelsmentioning
confidence: 99%
“…The encoder-decoder models usually use forward LSTMs to generate words from begin to end to make a sentence [1,2,5]. Recently, bidirectional LSTMs have been developed to generate sentences from two directions independently, i.e., a forward LSTM and a backward LSTM are trained without interaction [38,39]. However, there are three problems unsolved.…”
Section: Introductionmentioning
confidence: 99%