This article aims to present a set of discourse structure relations that are easy to code and to develop criteria for an appropriate data structure for representing these relations. Discourse structure here refers to informational relations that hold between sentences in a discourse. The set of discourse relations introduced here is based on Hobbs (1985). We present a method for annotating discourse coherence structures that we used to manually annotate a database of 135 texts from the Wall Street Journal and the AP Newswire. Alltexts were independently annotated by two annotators. Kappa values of greater than 0.8 indicated good interannotator agreement. We furthermore present evidence that trees are not a descriptively adequate data structure for representing discourse structure: In coherence structures of naturally occurring texts, we found many different kinds of crossed dependencies, as well as many nodes with multiple parents. The claims are supported by statistical results from our hand-annotated database of 135 texts.
This paper used self-paced reading to test processing preferences in pronoun interpretation in English two clause sentences. The results demonstrate that people's preferences can be reversed by changing the coherence relation between the clauses. The results are not compatible with the existence of a single all-purpose strategy in pronoun resolution. Rather, the results support Kehler's (2002) hypothesis that the processing patterns observed in pronoun processing are a byproduct of more general cognitive inference processes underlying the establishment of coherence, such that discourse coherence guides pronoun reference, and pronoun reference guides discourse coherence.
Sentence ranking is a crucial part of generating text summaries. We compared human sentence rankings obtained in a psycholinguistic experiment to three different approaches to sentence ranking: A simple paragraph-based approach intended as a baseline, two word-based approaches, and two coherence-based approaches. In the paragraph-based approach, sentences in the beginning of paragraphs received higher importance ratings than other sentences. The word-based approaches determined sentence rankings based on relative word frequencies (Luhn (1958); Salton & Buckley (1988)). Coherence-based approaches determined sentence rankings based on some property of the coherence structure of a text (Marcu (2000); Page et al. (1998)). Our results suggest poor performance for the simple paragraph-based approach, whereas wordbased approaches perform remarkably well. The best performance was achieved by a coherence-based approach where coherence structures are represented in a non-tree structure. Most approaches also outperformed the commercially available MSWord summarizer.
We present a set of discourse structure relations that are easy to code, and develop criteria for an appropriate data structure for representing these relations.Discourse structure here refers to informational relations that hold between sentences in a discourse (cf. Hobbs, 1985). We evaluated whether trees are a descriptively adequate data structure for representing coherence. Trees are widely assumed as a data structure for representing coherence but we found that more powerful data structures are needed: In coherence structures of naturally occurring texts, we found many different kinds of crossed dependencies, as well as many nodes with multiple parents. The claims are supported by statistical results from a database of 135 texts from the Wall Street Journal and the AP Newswire that were hand-annotated with coherence relations, based on the annotation schema presented in this paper.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.