2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2015
DOI: 10.1109/cvpr.2015.7299087
|View full text |Cite
|
Sign up to set email alerts
|

CIDEr: Consensus-based image description evaluation

Abstract: Automatically describing an image with a sentence is a long-standing challenge in computer vision and natural language processing. Due to recent progress in object detection, attribute classification, action recognition, etc., there is renewed interest in this area. However, evaluating the quality of descriptions has proven to be challenging. We propose a novel paradigm for evaluating image descriptions that uses human consensus. This paradigm consists of three main parts: a new triplet-based method of collect… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

3
1,879
1
2

Year Published

2015
2015
2023
2023

Publication Types

Select...
7
2

Relationship

0
9

Authors

Journals

citations
Cited by 3,224 publications
(1,885 citation statements)
references
References 32 publications
3
1,879
1
2
Order By: Relevance
“…We calculate BLEU (Papineni et al 2002), CIDEr (Vedantam et al 2015a), and METEOR (Denkowski and Lavie 2014) scores between the generated descriptions and their ground-truth descriptions. In all cases, the model trained on VisualGenome performs better.…”
Section: Generating Region Descriptionsmentioning
confidence: 99%
“…We calculate BLEU (Papineni et al 2002), CIDEr (Vedantam et al 2015a), and METEOR (Denkowski and Lavie 2014) scores between the generated descriptions and their ground-truth descriptions. In all cases, the model trained on VisualGenome performs better.…”
Section: Generating Region Descriptionsmentioning
confidence: 99%
“…BLEU is the precision of word n-grams between generated and reference sentences. Additionally, scores like METEOR (Vedantam et al, 2015) which capture perplexity of models for a given transcription have gained widespread attention. Perplexity is the geometric mean of the inverse probability for each predicted word.…”
Section: Resultsmentioning
confidence: 99%
“…5.3.2. As known from literature Elliott and Keller 2013;Vedantam et al 2015), automatic evaluation measures do not always agree with the human evaluation. Here we see that human judges prefer the descriptions from Frame-Video-Concept Fusion approach in terms of correctness, grammar and relevance.…”
Section: Lsmdc 15mentioning
confidence: 97%
“…6), we focus our discussion on METEOR and CIDEr scores in the preliminary evaluations in this section. According to Elliott and Keller (2013) and Vedantam et al (2015), METEOR/CIDEr supersede previously used measures in terms of agreement with human judgments.…”
Section: Automatic Metricsmentioning
confidence: 99%