We present an annotation effort that involves adding a new layer of annotation to an existing corpus. We are interested in how rhetorical relations are signalled in discourse, and thus begin with a corpus already annotated for rhetorical relations, to which we add signalling information. We show that a very large number of relations carry signals that identify them as such. The detailed, extensive analysis of signals in the corpus will aid research in the automatic parsing of discourse relations.
We present the RST Signalling Corpus (Das et al., 2015), a corpus annotated for signals of coherence relations. The corpus is developed over the RST Discourse Treebank (Carlson et al., 2002) which is annotated for coherence relations. In the RST Signalling Corpus, these relations are further annotated with signalling information. The corpus includes annotation not only for discourse markers which are considered to be the most typical (or sometimes the only type of) signals in discourse, but also for a wide array of other signals such as reference, lexical, semantic, syntactic, graphical and genre features as potential indicators of coherence relations. We describe the research underlying the development of the corpus and the annotation process, and provide details of the corpus. We also present the results of an inter-annotator agreement study, illustrating the validity and reproducibility of the annotation. The corpus is available through the Linguistic Data Consortium (LDC), and can be used to investigate the psycholinguistic mechanisms behind the interpretation of relations through signalling, and also to develop discourse-specific computational systems such as discourse parsing applications.
We present a new lexicon of English discourse connectives called DiMLex-Eng, built by merging information from two annotated corpora and an additional list of relation signals from the literature. The format follows the German connective lexicon DiMLex, which provides a crosslinguistically applicable XML schema. DiMLex-Eng contains 149 English connectives, and gives information on syntactic categories, discourse semantics and non-connective uses (if any). We report on the development steps and discuss design decisions encountered in the lexicon expansion phase. The resource is freely available for use in studies of discourse structure and computational applications.
We present a proposal to analyze disagreement in Rhetorical Structure Theory annotation which takes into account what we consider "legitimate" disagreements. In rhetorical analysis, as in many other pragmatic annotation tasks, a certain amount of disagreement is to be expected, and it is important to distinguish true mistakes from legitimate disagreements due to different possible interpretations of the structure and intention of a text. Using different sets of annotations in German and English, we present an analysis of such possible disagreements, and propose an underspecified representation that captures the disagreements.
We examine the role of discourse relations (relations between propositions) in the interpretation of evaluative or opinion words. Through a combination of Rhetorical Structure Theory or RST (Mann & Thompson, 1988) and Appraisal Theory (Martin & White, 2005), we analyze how different discourse relations modify the evaluative content of opinion words, and what impact the nucleus-satellite structure in RST has on the evaluation. We conduct a corpus study, examining and annotating over 3,000 evaluative words in 50 movie reviews in the SFU Review Corpus (Taboada, 2008) with respect to five parameters: word category (nouns, verbs, adjectives or adverbs), prior polarity (positive, negative or neutral), RST structure (both nucleus-satellite status and relation type) and change of polarity as a result of being part of a discourse relation (Intensify, Downtone, Reversal or No Change). Results show that relations such as Concession, Elaboration, Evaluation, Evidence and Restatement most frequently intensify the polarity of the opinion words, although the majority of evaluative words (about 70%) do not undergo changes in their polarity because of the relations they are a part of. We also find that most opinion words (about 70%) are positioned in the nucleus, confirming a hypothesis in the literature, that nuclei are the most important units when extracting evaluation automatically.
In this paper, we investigate the signalling of coherence relations when they are simultaneously indicated by more than one signal. In particular, we examine the co-occurrence of discourse markers and other relational signals when they are used together to mark a single relation. With the goal to identify the source of the usage of multiple signals, we postulate a twofold hypothesis: the co-occurrence of discourse markers and other textual signals can result from the type of the discourse markers themselves, or it can be triggered by the semantics of the relations in question. We conduct a corpus study, examining instances of multiple signals (co-occurrence of discourse markers and other signals) in the RST Signalling Corpus (Das et al., 2015). We analyze discourse markers that appear as part of multiple signals and also relations that frequently employ multiple signals as their indicators. Our observations suggest that the signalling of relations by multiple signals is a complex phenomenon, since the co-occurrence of discourse markers and other textual signals appears to arise from multiple sources.
We introduce our pilot study applying PDTBstyle annotation to Twitter conversations. Lexically grounded coherence annotation for Twitter threads will enable detailed investigations of the discourse structure of conversations on social media. Here, we present our corpus of 185 threads and annotation, including an inter-annotator agreement study. We discuss our observations as to how Twitter discourses differ from written news text wrt. discourse connectives and relations. We confirm our hypothesis that discourse relations in written social media conversations are expressed differently than in (news) text. We find that in Twitter, connective arguments frequently are not full syntactic clauses, and that a few general connectives expressing EXPANSION and CONTINGENCY make up the majority of the explicit relations in our data.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.