Proceedings of the 16th International Natural Language Generation Conference 2023
DOI: 10.18653/v1/2023.inlg-main.21
|View full text |Cite
|
Sign up to set email alerts
|

HL Dataset: Visually-grounded Description of Scenes, Actions and Rationales

Michele Cafagna,
Kees van Deemter,
Albert Gatt

Abstract: Current captioning datasets focus on objectcentric captions, describing the visible objects in the image, e.g. "people eating food in a park". Although these datasets are useful to evaluate the ability of Vision & Language models to recognize and describe visual content, they do not support controlled experiments involving model testing or fine-tuning, with more high-level captions, which humans find easy and natural to produce. For example, people often describe images based on the type of scene they depict (… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1

Relationship

1
0

Authors

Journals

citations
Cited by 1 publication
references
References 29 publications
0
0
0
Order By: Relevance