Proceedings of the 12th International Conference on Natural Language Generation 2019
DOI: 10.18653/v1/w19-8621
|View full text |Cite
|
Sign up to set email alerts
|

Tell Me More: A Dataset of Visual Scene Description Sequences

Abstract: We present a dataset consisting of what we call image description sequences. These multisentence descriptions of the contents of an image were collected in a pseudo-interactive setting, where the describer was told to describe the given image to a listener who needs to identify the image within a set of images, and who successively asks for more information. As we show, this setup produced nicely structured data that, we think, will be useful for learning models capable of planning and realising such descripti… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

0
13
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
4
1

Relationship

1
9

Authors

Journals

citations
Cited by 20 publications
(22 citation statements)
references
References 18 publications
(15 reference statements)
0
13
0
Order By: Relevance
“…Visual Dialogues have been the aim of early work on natural language understanding (NLU) (Winograd, 1972) and are now studied by a very active community at the interplay between computer vision and computational linguistics (e.g. Baldridge et al (2018); Ilinykh et al (2019); Haber et al (2019)). Recently, important progress has been made on visual dialogue systems thanks to the release of datasets like Vis-Dial (Das et al, 2017) and GuessWhat?!…”
Section: Introductionmentioning
confidence: 99%
“…Visual Dialogues have been the aim of early work on natural language understanding (NLU) (Winograd, 1972) and are now studied by a very active community at the interplay between computer vision and computational linguistics (e.g. Baldridge et al (2018); Ilinykh et al (2019); Haber et al (2019)). Recently, important progress has been made on visual dialogue systems thanks to the release of datasets like Vis-Dial (Das et al, 2017) and GuessWhat?!…”
Section: Introductionmentioning
confidence: 99%
“…Visual Dialogues have a long tradition (e.g., Anderson et al, 1991 ). They can be chit-chat (e.g., Das et al, 2017 ) or task-oriented (e.g., de Vries et al, 2017 ; Haber et al, 2019 ; Ilinykh et al, 2019a , b ). Task-oriented dialogues are easier to evaluate since their performance can be judged in terms of their task-success, hence we focus on this type of dialogues which can be further divided as following: the two agents can have access to the same visual information (de Vries et al, 2017 ), share only part of it Haber et al ( 2019 ) and Ilinykh et al ( 2019a ) or only one agent has access to the image (Chattopadhyay et al, 2017 ).…”
Section: Introductionmentioning
confidence: 99%
“…Instead, the reference is guided by visual attention. We present a linguistic perspective on these challenges by analysing a pilot annotation of two situated dialogue corpora: the Cups corpus (Dobnik et al, 2020) and the Tell-me-more corpus (Ilinykh et al, 2019), shown below in Figure 1 and example (1) respectively. Starting from the annotation scheme for several textual coreference datasets (Artstein and Poesio, 2006;Pradhan et al, 2007;Uryupina et al, 2019), this exercise proved useful to pinpoint in what ways the purely textual doc-ument scenario is different from the domain of embodied interaction.…”
Section: Introductionmentioning
confidence: 99%