Proceedings of the 9th International Natural Language Generation Conference 2016
DOI: 10.18653/v1/w16-6642
|View full text |Cite
|
Sign up to set email alerts
|

Towards Generating Colour Terms for Referents in Photographs: Prefer the Expected or the Unexpected?

Abstract: Colour terms have been a prime phenomenon for studying language grounding, though previous work focussed mostly on descriptions of simple objects or colour swatches. This paper investigates whether colour terms can be learned from more realistic and potentially noisy visual inputs, using a corpus of referring expressions to objects represented as regions in real-world images. We obtain promising results from combining a classifier that grounds colour terms in visual input with a recalibration model that adjust… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
11
0

Year Published

2017
2017
2020
2020

Publication Types

Select...
5
1

Relationship

3
3

Authors

Journals

citations
Cited by 7 publications
(11 citation statements)
references
References 16 publications
0
11
0
Order By: Relevance
“…Similarly, distributional similarities between colors seem to be misleading rather than helpful, cf. (Zarrieß and Schlangen, 2016b) for a study on color adjectives on the same corpus. This effect seems to be related to findings on antonyms in distributional modeling (Nguyen et al, 2016).…”
Section: Resultsmentioning
confidence: 99%
“…Similarly, distributional similarities between colors seem to be misleading rather than helpful, cf. (Zarrieß and Schlangen, 2016b) for a study on color adjectives on the same corpus. This effect seems to be related to findings on antonyms in distributional modeling (Nguyen et al, 2016).…”
Section: Resultsmentioning
confidence: 99%
“…The dialogue system's task is to generate expressions referring to objects in real-world images, intending to identify these objects to a human listener. The system's underlying generation component predicts words directly from low-level visual input representations of the target object defined via a bounding box in the image, based on the approach in [6,7]. As illustrated in Figure 1, this can lead to partially defective utterances being generated, due to imperfect visual language grounding [7].…”
Section: Introductionmentioning
confidence: 99%
“…Thus, it has been argued, that dialogue systems interacting with users in real-world environments need principled communicative mechanisms for dealing with uncertainties, perceptual mismatches, and potential misunderstanding [14,15,16,17]. In this study, we look at (potentially less disturbing) defective color terms, which turn out to be hard to predict for objects in real-world images as well [6]. Thus, for compiling the materials of our experiment, we used color terms predicted for objects in images by [6]'s model which we identified as defective based on annotated color terms in the training set of the model.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…While there have been previous approaches to generating referring expressions (REs) under uncertainty, those algorithms have been explicitly designed to refer to objects in visual scenes, and as such are tightly integrated with visual classifiers (Zarrieß and Schlangen, 2016;Roy, 2002;Meo et al, 2014). This is problematic for least two reasons: First, intelligent agents may need to generate REs for a much wider class of entities than those appearing in a visual scene (e.g., agents, locations, ideas, utterances), which may not be possible if an REG algorithm is tightly coupled with visual classifiers.…”
Section: Introductionmentioning
confidence: 99%