Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics 2019
DOI: 10.18653/v1/p19-1650
|View full text |Cite
|
Sign up to set email alerts
|

Informative Image Captioning with External Sources of Information

Abstract: An image caption should fluently present the essential information in a given image, including informative, fine-grained entity mentions and the manner in which these entities interact. However, current captioning models are usually trained to generate captions that only contain common object names, thus falling short on an important "informativeness" dimension. We present a mechanism for integrating image information together with fine-grained labels (assumed to be generated by some upstream models) into a ca… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
18
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
2
1

Relationship

1
7

Authors

Journals

citations
Cited by 26 publications
(19 citation statements)
references
References 24 publications
0
18
0
Order By: Relevance
“…Following (Zhao et al, 2019), we obtain the subjects' ratings for fidelity (the first caption is superior in terms of making less mistakes? ), informativeness (the first caption provides more informative and detailed description?…”
Section: Quantitative Analysismentioning
confidence: 99%
“…Following (Zhao et al, 2019), we obtain the subjects' ratings for fidelity (the first caption is superior in terms of making less mistakes? ), informativeness (the first caption provides more informative and detailed description?…”
Section: Quantitative Analysismentioning
confidence: 99%
“…To overcome the limitations imposed by the automatic metrics, several studies evaluate their models using hu-man judgments (Sharma et al 2018;Zhao et al 2019;Dognin et al 2019;Forbes et al 2019). However, none of them utilizes the human-rated captions in the model evaluations.…”
Section: Related Workmentioning
confidence: 99%
“…Image captioning is the task of automatically generating fluent natural language descriptions for an input image. However, measuring the quality of generated captions in an automatic manner is a challenging and yet-unsolved task; therefore, human evaluations are often required to assess the complex semantic relationships between a visual scene and a generated caption (Sharma et al 2018;Cui et al 2018;Zhao et al 2019). As a result, there is a mismatch between the training objective of the captioning models and their final evaluation criteria.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…The capability of the machines to comprehend and complete users' goals has empowered researchers to build advanced dialogue systems. With the progress in visual question answering [1,2] and image captioning [3,4], the use of different modalities in dialogue agents has shown remarkable performance bringing the different areas of computer vision (CV) and natural language processing (NLP) together. Hence, multimodal dialogue system bridges the gap between vision and language, ensuring interdisciplinary research.…”
Section: Introductionmentioning
confidence: 99%