2015
DOI: 10.1515/cllt-2013-0008
|View full text |Cite
|
Sign up to set email alerts
|

Establishing criteria for RST-based discourse segmentation and annotation for texts in Basque

Abstract: This article presents a discourse annotation methodology based on Rhetorical Structure Theory and an empirical study of annotating a corpus of specialized medical texts in Basque. The annotation process includes two phases: segmentation and annotation of rhetorical relations. Phase one entails an initial study which leads to establishing linguistic criteria for sentence-based segmentation; a second phase focuses on annotation of rhetorical relations. After establishing discourse segments and rhetorical relatio… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
9
0
1

Year Published

2016
2016
2021
2021

Publication Types

Select...
3
2
2

Relationship

1
6

Authors

Journals

citations
Cited by 17 publications
(12 citation statements)
references
References 5 publications
0
9
0
1
Order By: Relevance
“…It is probably fair to say, however, that this annotation has mostly been applied in computational linguistics/natural language processing setting rather than in corpus linguistics proper, which is why we do not discuss this in depth. Examples for such corpora include the Lancaster Anaphoric Treebank, the Rhetorical Structure Discourse Treebank (Carlson, Marcu, and Okurowski 2003), which contains, "among other data, […] articles from the Penn Treebank, which were annotated with discourse structure in the framework of Rhetorical Structure Theory" [88,762], the EUSKAL RST Treebank-A (https://ixa.si.ehu.es/Ixa/resources/ Euskal_RSTTreebank), a very small corpus (approximately 3 K words) of abstracts of medical articles annotated on the basis of Rhetorical Structure Theory [36], and the Penn Discourse Treebank [67]. Mitkov [59] briefly discusses examples of bi-/multilingual parallel corpora which have been annotated for anaphoric or coreferential relationships; cf.…”
Section: Discourse-pragmatic Annotationmentioning
confidence: 99%
“…It is probably fair to say, however, that this annotation has mostly been applied in computational linguistics/natural language processing setting rather than in corpus linguistics proper, which is why we do not discuss this in depth. Examples for such corpora include the Lancaster Anaphoric Treebank, the Rhetorical Structure Discourse Treebank (Carlson, Marcu, and Okurowski 2003), which contains, "among other data, […] articles from the Penn Treebank, which were annotated with discourse structure in the framework of Rhetorical Structure Theory" [88,762], the EUSKAL RST Treebank-A (https://ixa.si.ehu.es/Ixa/resources/ Euskal_RSTTreebank), a very small corpus (approximately 3 K words) of abstracts of medical articles annotated on the basis of Rhetorical Structure Theory [36], and the Penn Discourse Treebank [67]. Mitkov [59] briefly discusses examples of bi-/multilingual parallel corpora which have been annotated for anaphoric or coreferential relationships; cf.…”
Section: Discourse-pragmatic Annotationmentioning
confidence: 99%
“…Following [30] and [31], we have also calculated inter-annotator agreement by using Kappa Cohen in two ways: taking into account words as boundaries and taking into account clauses as boundaries. For the first one, the Kappa value is 0.9556 and, for the second one (that is more conservative), the Kappa value is 0.8674.…”
Section: Corpusmentioning
confidence: 99%
“…According to Iruskieta et al (2013), Computational Linguistics depends on discourse annotated corpora for the creation of automatic applications. The research that resulted in this paper intends to create a dictionary for sentiment analysis by extracting comments from Facebook public pages related to diverse themes, such as politics, education, religion, music, lifestyle etc.…”
Section: Introductionmentioning
confidence: 99%