Learning-based Composite Metrics for Improved Caption Evaluation

Sharif, Naeha; White, Lyndon; Bennamoun, Mohammed; Shah, Syed Afaq Ali

doi:10.18653/v1/p18-3003

Cited by 18 publications

(15 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…More generally, the metrics can come from any of the three categories discussed such as semantic, syntactic or lexical. Sharif et al [44] found in particular that covering all three categories improved the overall results.…”

Section: Handcrafted Metricsmentioning

confidence: 98%

LCEval: Learned Composite Metric for Caption Evaluation

Sharif

White²,

Bennamoun

et al. 2019

Int J Comput Vis

Self Cite

View full text Add to dashboard Cite

Automatic evaluation metrics hold a fundamental importance in the development and fine-grained analysis of captioning systems. While current evaluation metrics tend to achieve an acceptable correlation with human judgements at the system level, they fail to do so at the caption level. In this work, we propose a neural network-based learned metric to improve the caption-level caption evaluation. To get a deeper insight into the parameters which impact a learned metric's performance, this paper investigates the relationship between different linguistic features and the captionlevel correlation of the learned metrics. We also compare metrics trained with different training examples to measure the variations in their evaluation. Moreover, we perform a robustness analysis, which highlights the sensitivity of learned and handcrafted metrics to various sentence perturbations. Our empirical analysis shows that our proposed metric not only outperforms the existing metrics in terms of caption-level correlation but it also shows a strong system-level correlation against human assessments.

show abstract

Section: Handcrafted Metricsmentioning

confidence: 98%

LCEval: Learned Composite Metric for Caption Evaluation

Sharif

White²,

Bennamoun

et al. 2019

Int J Comput Vis

Self Cite

View full text Add to dashboard Cite

show abstract

“…However, commonly used evaluation metrics consider only some specific features (e.g., lexical or semantic) of languages. Sharif et al [125] proposed learning-based composite metrics for evaluation of image captions. The composite metric incorporates a set of linguistic features to achieve the two main aspects of assessment and shows improved performances.…”

Section: Spice Spice (Semantic Propositional Image Caption Evaluation)mentioning

confidence: 99%

A Comprehensive Survey of Deep Learning for Image Captioning

et al. 2019

View full text Add to dashboard Cite

Generating a description of an image is called image captioning. Image captioning requires to recognize the important objects, their attributes and their relationships in an image. It also needs to generate syntactically and semantically correct sentences. Deep learning-based techniques are capable of handling the complexities and challenges of image captioning. In this survey paper, we aim to present a comprehensive review of existing deep learning-based image captioning techniques. We discuss the foundation of the techniques to analyze their performances, strengths and limitations. We also discuss the datasets and the evaluation metrics popularly used in deep learning based automatic image captioning.

show abstract

“…These provided human annotations work as a reference while evaluating predicted descriptions. Adequacy, fidelity, and eloquence of the translation are the main aspects of machine translation observed by humans to do the evaluation [72]. The most desirable characteristic of an automatic evaluation metric is its strong correlation with human scores [73], i.e., the closer the generated or predicted translation to a professional human translation is considered better.…”

Section: Evaluation Metricsmentioning

confidence: 99%

“…The most desirable characteristic of an automatic evaluation metric is its strong correlation with human scores [73], i.e., the closer the generated or predicted translation to a professional human translation is considered better. The accuracy of a metric is considered to be higher if it assigns a greater score to the caption favored by humans [72].…”

Section: Evaluation Metricsmentioning

confidence: 99%

Video Description: Datasets & Evaluation Metrics

Rafiq

Choi

2021

IEEE Access

View full text Add to dashboard Cite

Rapid expansion and the novel phenomenon of deep learning have manifested a variety of proposals and concerns in the area of video description, particularly in the recent past. Automatic event localization and textual alternatives generation for the complex and diverse visual data supplied in a video can be articulated as video description, bridging the two leading realms of computer vision and natural language processing. Several sequence-to-sequence algorithms are being proposed by splitting the task into two segments, namely encoding, i.e., getting and learning the insights of the visual representations, and decoding, i.e., transforming the learned representations to a sequence of words, one at a time. Implemented deep learning approaches have gained a lot of recognition for the reason of their superior computing capabilities and tremendous performance. However, the accomplishment of these algorithms strongly depends on the nature, diversity, and amount of data they are trained, validated and tested on. Techniques applied on insufficient and inadequate train/test data cannot deliver promising conclusions, consequently making it complicated to evaluate the quality of generated results. This survey focuses explicitly on the benchmark datasets, and evaluation metrics developed and deployed for video description tasks and their capabilities and limitations. Finally, we concluded with the need for essential enhancements and encouraging research directions on the topic.

show abstract

Learning-based Composite Metrics for Improved Caption Evaluation

Cited by 18 publications

References 23 publications

LCEval: Learned Composite Metric for Caption Evaluation

LCEval: Learned Composite Metric for Caption Evaluation

A Comprehensive Survey of Deep Learning for Image Captioning

Video Description: Datasets & Evaluation Metrics

Contact Info

Product

Resources

About