2010
DOI: 10.1007/978-3-642-15573-4_15
|View full text |Cite
|
Sign up to set email alerts
|

Generating Referring Expressions in Context: The GREC Task Evaluation Challenges

Abstract: Abstract. Until recently, referring expression generation (reg) research focused on the task of selecting the semantic content of definite mentions of listener-familiar discourse entities. In the grec research programme we have been interested in a version of the reg problem definition that is (i) grounded within discourse context, (ii) embedded within an application context, and (iii) informed by naturally occurring data. This paper provides an overview of our aims and motivations in this research programme, … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
27
0

Year Published

2010
2010
2022
2022

Publication Types

Select...
5
2
1

Relationship

2
6

Authors

Journals

citations
Cited by 33 publications
(40 citation statements)
references
References 27 publications
(33 reference statements)
1
27
0
Order By: Relevance
“…Viethen & Dale, 2007;van Deemter, Gatt, van der Sluis, & Power, 2012). Recent work in the context of a series of nlg shared tasks, in which participants are required to design algorithms that are developed and tested against a common dataset to enable comparison, has shown that results from these two perspectives may diverge significantly Belz, Kow, Viethen, & Gatt, 2010). For instance, an algorithm's choice of content for referential descriptions may be very similar to the choices humans make, as shown by its degree of match to corpus data, but this does not imply that the resulting description will be easily resolved by human listeners.…”
Section: How Should Models Be Evaluated?mentioning
confidence: 99%
“…Viethen & Dale, 2007;van Deemter, Gatt, van der Sluis, & Power, 2012). Recent work in the context of a series of nlg shared tasks, in which participants are required to design algorithms that are developed and tested against a common dataset to enable comparison, has shown that results from these two perspectives may diverge significantly Belz, Kow, Viethen, & Gatt, 2010). For instance, an algorithm's choice of content for referential descriptions may be very similar to the choices humans make, as shown by its degree of match to corpus data, but this does not imply that the resulting description will be easily resolved by human listeners.…”
Section: How Should Models Be Evaluated?mentioning
confidence: 99%
“…What we have found again and again in our evaluation experiments (see also Gatt & Belz [12], and Belz et al [6], both in this volume), is that different types of evaluation do not necessarily agree with each other, and that we should not regard any single one of them as an objective measure of quality, but rather as assessing one particular aspect of systems. If we want our wind forecasts to be similar to the corpus forecasts, then BLEU and NIST can probably give us a fair indication of that type of similarity; if we are interested in how readable and clear human readers (think they) find our forecasts, then we should look at the Clarity and Readability scores.…”
Section: Systemmentioning
confidence: 62%
“…The two automatic metrics used in the evaluations, NIST 5 and BLEU, 6 in the METEO domain [3]. BLEU-x is an n-gram based string comparison measure, originally proposed by Papineni et al [19] for evaluation of MT systems.…”
Section: Automatic Evaluation Methodsmentioning
confidence: 99%
“…This tasks aimed at the unique identification of the referent and brevity of the referring expression. Slightly different, the GREC challenges Belz et al, 2009;Belz et al, 2010) propose the generation of referring expressions in a discourse context. The GREC tasks use a corpus created from Wikipedia abstracts on geographic entities and people and with two referring expression annotation schemes, reference type and word strings.…”
Section: Generating Referring Expressions (Gre)mentioning
confidence: 99%