2019
DOI: 10.1007/s11263-019-01206-z
|View full text |Cite
|
Sign up to set email alerts
|

LCEval: Learned Composite Metric for Caption Evaluation

Abstract: Automatic evaluation metrics hold a fundamental importance in the development and fine-grained analysis of captioning systems. While current evaluation metrics tend to achieve an acceptable correlation with human judgements at the system level, they fail to do so at the caption level. In this work, we propose a neural network-based learned metric to improve the caption-level caption evaluation. To get a deeper insight into the parameters which impact a learned metric's performance, this paper investigates the … Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
4
3
1

Relationship

4
4

Authors

Journals

citations
Cited by 10 publications
(5 citation statements)
references
References 43 publications
0
5
0
Order By: Relevance
“…SubICap-1k also achieves the highest SPICE and METEOR scores amongst the SOTA models. This shows that our model generates captions which are semantically better than those generated by other models [28].…”
Section: Quantitative Analysismentioning
confidence: 68%
See 1 more Smart Citation
“…SubICap-1k also achieves the highest SPICE and METEOR scores amongst the SOTA models. This shows that our model generates captions which are semantically better than those generated by other models [28].…”
Section: Quantitative Analysismentioning
confidence: 68%
“…7.1, the n-gram based measures tend to overlook the semantics and only focus on the lexical properties of the captions. CIDEr, which is an ngram based measure, [28] prefers captions which have a higher lexical correspondence to the ground truth caption. However, in various cases, it is quite possible that two captions which have different words or structure, might carry the same meaning and vice versa.…”
Section: Qualitative Analysismentioning
confidence: 99%
“…A number of automatic evaluation metrics have been proposed for captioning, which can be categorized into supervised [15], [16], [17] and unsupervised [18], [19], [20] methods. The work presented here, falls in the latter category.…”
Section: • Automatic Evaluation Metricsmentioning
confidence: 99%
“…Moreover, comparing visual (source) and textual (target) information is not a straightforward task, and greatly adds to the complexity. -Supervised Metrics: Supervised metrics such as NNEval [15] and LCEval [16] combine existing metrics into a single unified measure, which has shown improvement in performance. However, the drawbacks of learned metrics are their high complexity and subjectivity to the training examples.…”
Section: Calculate Similaritymentioning
confidence: 99%
“…It learns the most predictive features (learned features) directly from data given a large dataset of labeled examples. In recent years, deep learning techniques have emerged as highly effective methods for prediction and decision-making in a multitude of disciplines including health (hearing aids), computer vision (e.g., object and face identification), [2], [3], [4], [5], natural language processing [6], [7], [8], gesture recognition [9], [10], [11], and robotics [12].…”
Section: Introductionmentioning
confidence: 99%