Generating Referring Expressions in Context: The GREC Task Evaluation Challenges

Belz, Anja; Kow, Eric; Viethen, Jette; Gatt, Albert

doi:10.1007/978-3-642-15573-4_15

Cited by 33 publications

(40 citation statements)

References 27 publications

(33 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Viethen & Dale, 2007;van Deemter, Gatt, van der Sluis, & Power, 2012). Recent work in the context of a series of nlg shared tasks, in which participants are required to design algorithms that are developed and tested against a common dataset to enable comparison, has shown that results from these two perspectives may diverge significantly Belz, Kow, Viethen, & Gatt, 2010). For instance, an algorithm's choice of content for referential descriptions may be very similar to the choices humans make, as shown by its degree of match to corpus data, but this does not imply that the resulting description will be easily resolved by human listeners.…”

Section: How Should Models Be Evaluated?mentioning

confidence: 99%

Models and empirical data for the production of referring expressions

Gatt

Krahmer

Deemter

et al. 2014

Language, Cognition and Neuroscience

Self Cite

View full text Add to dashboard Cite

This paper introduces a special issue of Language, Cognition and Neuroscience dedicated to Production of Referring Expressions: Models and Empirical Data, focusing on models of reference production that make empirically testable predictions, as well as on empirical work that can inform the design of such models. In addition to introducing the volume, this paper also gives an overview of recent experimental and modelling work, focusing on two principal aspects of reference production, namely, choice of anaphoric referential expression, and choice of semantic content for referential noun phrases. It also addresses the distinction between dialogue and non-dialogue settings, focussing especially on the impact of a dialogue setting on referential choice and the evidence for audience design in the choices speakers make.

show abstract

Section: How Should Models Be Evaluated?mentioning

confidence: 99%

Models and empirical data for the production of referring expressions

Gatt

Krahmer

Deemter

et al. 2014

Language, Cognition and Neuroscience

Self Cite

View full text Add to dashboard Cite

show abstract

“…What we have found again and again in our evaluation experiments (see also Gatt & Belz [12], and Belz et al [6], both in this volume), is that different types of evaluation do not necessarily agree with each other, and that we should not regard any single one of them as an objective measure of quality, but rather as assessing one particular aspect of systems. If we want our wind forecasts to be similar to the corpus forecasts, then BLEU and NIST can probably give us a fair indication of that type of similarity; if we are interested in how readable and clear human readers (think they) find our forecasts, then we should look at the Clarity and Readability scores.…”

Section: Systemmentioning

confidence: 62%

“…The two automatic metrics used in the evaluations, NIST 5 and BLEU, 6 in the METEO domain [3]. BLEU-x is an n-gram based string comparison measure, originally proposed by Papineni et al [19] for evaluation of MT systems.…”

Section: Automatic Evaluation Methodsmentioning

confidence: 99%

Assessing the Trade-Off between System Building Cost and Output Quality in Data-to-Text Generation

Belz

Kow

2010

Empirical Methods in Natural Language Generation

Self Cite

View full text Add to dashboard Cite

Abstract. Data-to-text generation systems tend to be knowledge-based and manually built, which limits their reusability and makes them time and cost-intensive to create and maintain. Methods for automating (part of) the system building process exist, but do such methods risk a loss in output quality? In this paper, we investigate the cost/quality trade-off in generation system building. We compare six data-to-text systems which were created by predominantly automatic techniques against six systems for the same domain which were created by predominantly manual techniques. We evaluate the systems using intrinsic automatic metrics and human quality ratings. We find that there is some correlation between degree of automation in the system-building process and output quality (more automation tending to mean lower evaluation scores). We also find that there are discrepancies between the results of the automatic evaluation metrics and the humanassessed evaluation experiments. We discuss caveats in assessing system-building cost and implications of the discrepancies in automatic and human evaluation.

show abstract

“…This tasks aimed at the unique identification of the referent and brevity of the referring expression. Slightly different, the GREC challenges Belz et al, 2009;Belz et al, 2010) propose the generation of referring expressions in a discourse context. The GREC tasks use a corpus created from Wikipedia abstracts on geographic entities and people and with two referring expression annotation schemes, reference type and word strings.…”

Section: Generating Referring Expressions (Gre)mentioning

confidence: 99%

Proceedings of the 9th International Natural Language Generation conference

Kok

2016

View full text Add to dashboard Cite

Generating Referring Expressions in Context: The GREC Task Evaluation Challenges

Cited by 33 publications

References 27 publications

Models and empirical data for the production of referring expressions

Models and empirical data for the production of referring expressions

Assessing the Trade-Off between System Building Cost and Output Quality in Data-to-Text Generation

Proceedings of the 9th International Natural Language Generation conference

Contact Info

Product

Resources

About