Generation of Referring Expressions is a thriving subfield of Natural Language Generation which has traditionally focused on the task of selecting a set of attributes that unambiguously identify a given referent. In this paper, we address the complementary problem of generating repeated, potentially different referential expressions that refer to the same entity in the context of a piece of discourse longer than a sentence. We describe a corpus of short encyclopaedic texts we have compiled and annotated for reference to the main subject of the text, and report results for our experiments in which we set human subjects and automatic methods the task of selecting a referential expression from a wide range of choices in a full-text context. We find that our human subjects agree on choice of expression to a considerable degree, with three identical expressions selected in 50% of cases. We tested automatic selection strategies based on most frequent choice heuristics, involving different combinations of information about syntactic MSR type and domain type. We find that more information generally produces better results, achieving a best overall test set accuracy of 53.9% when both syntactic MSR type and domain type are known.
We explore the relationship between question answering and constraint relaxation in spoken dialog systems. We develop dialogue strategies for selecting and presenting information succinctly. In particular, we describe methods for dealing with the results of database queries in informationseeking dialogs. Our goal is to structure the dialogue in such a way that the user is neither overwhelmed with information nor left uncertain as to how to refine the query further. We present evaluation results obtained from a user study involving 20 subjects in a restaurant selection task.
We investigate the use of instance-based ranking methods for surface realization in natural language generation. Our approach to instance-based natural language generation (IBNLG) employs two components: a rule system that 'overgenerates' a number of realization candidates from a meaning representation and an instance-based ranker that scores the candidates according to their similarity to examples taken from a training corpus. We develop an efficient search technique for identifying the optimal candidate based on a novel extension of the A * algorithm. The rule system is produced automatically from a semantically annotated fragment of the Penn Treebank II containing management succession texts. We detail the annotation scheme and grammar induction algorithm and evaluate the efficiency and output of the generator. We also discuss issues such as input coverage (completeness) and fluency that are relevant to surface generation in general.
We present the ADAMACH data centric dialog system, that allows to perform on-and offline mining of dialog context, speech recognition results and other system-generated representations, both within and across dialogs. The architecture implements a "fat pipeline" for speech and language processing. We detail how the approach integrates domain knowledge and evolving empirical data, based on a user study in the University Helpdesk domain.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.