Proceedings of ACL 2018, Student Research Workshop 2018
DOI: 10.18653/v1/p18-3003
|View full text |Cite
|
Sign up to set email alerts
|

Learning-based Composite Metrics for Improved Caption Evaluation

Abstract: The evaluation of image caption quality is a challenging task, which requires the assessment of two main aspects in a caption: adequacy and fluency. These quality aspects can be judged using a combination of several linguistic features. However, most of the current image captioning metrics focus only on specific linguistic facets, such as the lexical or semantic, and fail to meet a satisfactory level of correlation with human judgements at the sentence-level. We propose a learning-based framework to incorporat… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
13
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
6
1

Relationship

1
6

Authors

Journals

citations
Cited by 18 publications
(15 citation statements)
references
References 23 publications
0
13
0
Order By: Relevance
“…More generally, the metrics can come from any of the three categories discussed such as semantic, syntactic or lexical. Sharif et al [44] found in particular that covering all three categories improved the overall results.…”
Section: Handcrafted Metricsmentioning
confidence: 98%
“…More generally, the metrics can come from any of the three categories discussed such as semantic, syntactic or lexical. Sharif et al [44] found in particular that covering all three categories improved the overall results.…”
Section: Handcrafted Metricsmentioning
confidence: 98%
“…However, commonly used evaluation metrics consider only some specific features (e.g., lexical or semantic) of languages. Sharif et al [125] proposed learning-based composite metrics for evaluation of image captions. The composite metric incorporates a set of linguistic features to achieve the two main aspects of assessment and shows improved performances.…”
Section: Spice Spice (Semantic Propositional Image Caption Evaluation)mentioning
confidence: 99%
“…These provided human annotations work as a reference while evaluating predicted descriptions. Adequacy, fidelity, and eloquence of the translation are the main aspects of machine translation observed by humans to do the evaluation [72]. The most desirable characteristic of an automatic evaluation metric is its strong correlation with human scores [73], i.e., the closer the generated or predicted translation to a professional human translation is considered better.…”
Section: Evaluation Metricsmentioning
confidence: 99%
“…The most desirable characteristic of an automatic evaluation metric is its strong correlation with human scores [73], i.e., the closer the generated or predicted translation to a professional human translation is considered better. The accuracy of a metric is considered to be higher if it assigns a greater score to the caption favored by humans [72].…”
Section: Evaluation Metricsmentioning
confidence: 99%