SPICE: Semantic Propositional Image Caption Evaluation

Anderson, Peter; Fernando, Basura; Johnson, Mark; Gould, Stephen J.

doi:10.1007/978-3-319-46454-1_24

Cited by 1,369 publications

(1,204 citation statements)

References 38 publications

(81 reference statements)

Supporting

Mentioning

1,195

Contrasting

Order By: Relevance

“…BLEU-1, BLEU-2, BLEU-3 and BLEU-4) based on the n-gram method of determining string/sentence similarity. It is also equipped with other evaluation metrics such as, METEOR [31], ROUGE-L [32], and SPICE [33]. The detailed results using all of the above metrics for the evaluation of the IAPR TC-12 dataset are shown in Table 3.…”

Section: Discussionmentioning

confidence: 99%

A hierarchical and regional deep learning architecture for image description generation

Kinghorn

Zhang

Shao

2019

Pattern Recognition Letters

View full text Add to dashboard Cite

Section: Discussionmentioning

confidence: 99%

A hierarchical and regional deep learning architecture for image description generation

Kinghorn

Zhang

Shao

2019

Pattern Recognition Letters

View full text Add to dashboard Cite

“…Following standard practice (Anderson et al, 2016;Elliott and Keller, 2014), we compared with Text in red is extra information, while text in green is missing information. sim(x, y) is the average similarity between machine-identified and true error vectors over image regions.…”

Section: Methodsmentioning

confidence: 99%

REO-Relevance, Extraness, Omission: A Fine-grained Evaluation for Image Captioning

Huang

et al. 2019

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen

View full text Add to dashboard Cite

Popular metrics used for evaluating image captioning systems, such as BLEU and CIDEr, provide a single score to gauge the system's overall effectiveness. This score is often not informative enough to indicate what specific errors are made by a given system. In this study, we present a fine-grained evaluation method REO for automatically measuring the performance of image captioning systems. REO assesses the quality of captions from three perspectives: 1) Relevance to the ground truth, 2) Extraness of the content that is irrelevant to the ground truth, and 3) Omission of the elements in the images and human references. Experiments on three benchmark datasets demonstrate that our method achieves a higher consistency with human judgments and provides more intuitive evaluation results than alternative metrics. 1

show abstract

“…19 The automatic evaluation measures include BLEU-1,-2,-3,-4 (Papineni et al 2002), METEOR (Denkowski and Lavie 2014), ROUGE-L (Lin 2004), and CIDEr . We also use the recently proposed evaluation measure SPICE (Anderson et al 2016), which aims to compare the semantic content of two descriptions, by matching the information contained in dependency parse trees for both descriptions. While we report all measures for the final evaluation in the LSMDC (Sect.…”

Section: Automatic Metricsmentioning

confidence: 99%

Movie Description

et al. 2017

View full text Add to dashboard Cite

Audio description (AD) provides linguistic descriptions of movies and allows visually impaired people to follow a movie along with their peers. Such descriptions are by design mainly visual and thus naturally form an interesting data source for computer vision and computational linguistics. In this work we propose a novel dataset which contains transcribed ADs, which are temporally aligned to full length movies. In addition we also collected and aligned movie scripts used in prior work and compare the two sources of descriptions. We introduce the Large Scale Movie Description Challenge (LSMDC) which contains a parallel corpus of 128,118 sentences aligned to video clips from 200 movies (around 150 h of video in total).

show abstract

SPICE: Semantic Propositional Image Caption Evaluation

Cited by 1,369 publications

References 38 publications

A hierarchical and regional deep learning architecture for image description generation

A hierarchical and regional deep learning architecture for image description generation

REO-Relevance, Extraness, Omission: A Fine-grained Evaluation for Image Captioning

Movie Description

Contact Info

Product

Resources

About