We argue the case for abstract document structure as a separate descriptive level in the analysis and generation of written texts. The purpose of this representation is to mediate between the message of a text (i.e., its discourse structure) and its physical presentation (i.e., its organization into graphical constituents like sections, paragraphs, sentences, bulleted lists, figures, and footnotes). Abstract document structure can be seen as an extension of Nunberg's “text-grammar” it is also closely related to “logical” markup in languages like HTML and LaTEX. We show that by using this intermediate representation, several subtasks in language generation and language understanding can be defined more cleanly.
Team sports commentaries call for techniques that are able to select content and generate wordings to reflect the affinity of the targeted reader for one of the teams. The existing works tend to have in common that they either start from knowledge sources of limited size to whose structures then different ways of realization are explicitly assigned, or they work directly with linguistic corpora, without the use of a deep knowledge source. With the increasing availability of large-scale ontologies this is no longer satisfactory: techniques are needed that are applicable to general purpose ontologies, but which still take user preferences into account. We take the best of both worlds in that we use a two-layer ontology. The first layer is composed of raw domain data modelled in an application-independent base OWL ontology. The second layer contains a rich perspective generation-motivated domain communication knowledge ontology, inferred from the base ontology. The two-layer ontology allows us to take into account user perspective-oriented criteria at different stages of generation to generate perspective-oriented commentaries. We show how content selection, discourse structuring, information structure determination, and lexicalization are driven by these criteria and how stage after stage a truly user perspective-tailored summary is generated. The viability of our proposal has been evaluated for the generation of football match summaries of the First Spanish Football League. The reported outcome of the evaluation demonstrates that we are on the right track.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.