Establishing criteria for RST-based discourse segmentation and annotation for texts in Basque

Iruskieta, Mikel; Ilarraza, Arantza Díaz de; Lersundi, Mikel

doi:10.1515/cllt-2013-0008

Cited by 17 publications

(12 citation statements)

References 5 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…It is probably fair to say, however, that this annotation has mostly been applied in computational linguistics/natural language processing setting rather than in corpus linguistics proper, which is why we do not discuss this in depth. Examples for such corpora include the Lancaster Anaphoric Treebank, the Rhetorical Structure Discourse Treebank (Carlson, Marcu, and Okurowski 2003), which contains, "among other data, […] articles from the Penn Treebank, which were annotated with discourse structure in the framework of Rhetorical Structure Theory" [88,762], the EUSKAL RST Treebank-A (https://ixa.si.ehu.es/Ixa/resources/ Euskal_RSTTreebank), a very small corpus (approximately 3 K words) of abstracts of medical articles annotated on the basis of Rhetorical Structure Theory [36], and the Penn Discourse Treebank [67]. Mitkov [59] briefly discusses examples of bi-/multilingual parallel corpora which have been annotated for anaphoric or coreferential relationships; cf.…”

Section: Discourse-pragmatic Annotationmentioning

confidence: 99%

Linguistic Annotation in/for Corpus Linguistics

Gries

Berez

2017

Handbook of Linguistic Annotation

View full text Add to dashboard Cite

This article surveys linguistic annotation in corpora and corpus linguistics. We first define the concept of 'corpus' as a radial category and then, in Sect. 2, discuss a variety of kinds of information for which corpora are annotated and that are exploited in contemporary corpus linguistics. Section 3 then exemplifies many current formats of annotation with an eye to highlighting both the diversity of formats currently available and the emergence of XML annotation as, for now, the most widespread form of annotation. Section 4 summarizes and concludes with desiderata for future developments.

show abstract

Section: Discourse-pragmatic Annotationmentioning

confidence: 99%

Linguistic Annotation in/for Corpus Linguistics

Gries

Berez

2017

Handbook of Linguistic Annotation

View full text Add to dashboard Cite

show abstract

“…Following [30] and [31], we have also calculated inter-annotator agreement by using Kappa Cohen in two ways: taking into account words as boundaries and taking into account clauses as boundaries. For the first one, the Kappa value is 0.9556 and, for the second one (that is more conservative), the Kappa value is 0.8674.…”

Section: Corpusmentioning

confidence: 99%

MultiLingMine 2016: Modeling, Learning and Mining for Cross/Multilinguality

Ienco

Roche

Romeo³

et al. 2016

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Abstract. The increasing availability of text information coded in many different languages poses new challenges to modern information retrieval and mining systems in order to discover and exchange knowledge at a larger world-wide scale. The 1st International Workshop on Modeling, Learning and Mining for Cross/Multilinguality (dubbed MultiLingMine 2016) provides a venue to discuss research advances in cross-/multilingual related topics, focusing on new multidisciplinary research questions that have not been deeply investigated so far (e.g., in CLEF and related events relevant to CLIR). This includes theoretical and experimental on-going works about novel representation models, learning algorithms, and knowledge-based methodologies for emerging trends and applications, such as, e.g., cross-view cross-/multilingual information retrieval and document mining, (knowledge-based) translation-independent cross-/multilingual corpora, applications in social network contexts, and more. MotivationsIn the last few years the phenomenon of multilingual information overload has received significant attention due to the huge availability of information coded in many different languages. We have in fact witnessed a growing popularity of tools that are designed for collaboratively editing through contributors across the world, which has led to an increased demand for methods capable of effectively and efficiently searching, retrieving, managing and mining different language-written document collections. The multilingual information overload phenomenon introduces new challenges to modern information retrieval systems. By better searching, indexing, and organizing such rich and heterogeneous information, we can discover and exchange knowledge at a larger world-wide scale. However, since research on multilingual information is relatively young, important issues still remain uncovered:-how to define a translation-independent representation of the documents across many languages;2 Romeo et Al.-whether existing solutions for comparable corpora can be enhanced to generalize to multiple languages without depending on bilingual dictionaries or incurring bias in merging language-specific results; -how to profitably exploit knowledge bases to enable translation-independent preserving and unveiling of content semantics; -how to define proper indexing and multidimensional data structures to better capture the multi-topic and/or multi-aspect nature of multi-lingual documents; -how to detect duplicate or redundant information among different languages or, conversely, novelty in the produced information; -how to enrich and update multi-lingual knowledge bases from documents; -how to exploit multi-lingual knowledge bases for question answering; -how to efficiently extend topic modeling to deal with multi/cross-lingual documents in many languages; -how to evaluate and visualize retrieval and mining results. Objectives, topics, and outcomesThe aim of the 1st International Workshop on Modeling, Learning and Mining for Cross/Multilinguality (dubbed Mu...

show abstract

“…According to Iruskieta et al (2013), Computational Linguistics depends on discourse annotated corpora for the creation of automatic applications. The research that resulted in this paper intends to create a dictionary for sentiment analysis by extracting comments from Facebook public pages related to diverse themes, such as politics, education, religion, music, lifestyle etc.…”

Section: Introductionmentioning

confidence: 99%

“Haters gonna hate”: challenges for sentiment analysis of Facebook comments in Brazilian Portuguese

Antônio¹,

Santin²

2017

Proceedings of the 6th Workshop on Recent Advances in RST and Related Formalisms

View full text Add to dashboard Cite

The aim of this paper is to present reflections from Discourse Analysis and from Construction Grammar on the creation of a dictionary for sentiment analysis of Facebook comments. The reflections from Discourse Analysis address problems such as the identification of the semantic orientation of words that present opposite polarities depending on the ideological formation of the speaker. Another reflection from Discourse Analysis regards the fact that the writers of the comments use nouns and noun phrases not only to name some entity, but also to build discourse objects in a way that the label they give to the discourse objects reveals an evaluation. In order to analyze constructions larger than words, such as idioms, we draw on Construction Grammar principles. The investigation of constructions and idioms can provide a better understanding of sentiment in text. The corpus consists of comments extracted manually from Facebook public discussion pages related to diverse themes, such as politics, education, religion, music, lifestyle etc.

show abstract

Establishing criteria for RST-based discourse segmentation and annotation for texts in Basque

Cited by 17 publications

References 5 publications

Linguistic Annotation in/for Corpus Linguistics

Linguistic Annotation in/for Corpus Linguistics

MultiLingMine 2016: Modeling, Learning and Mining for Cross/Multilinguality

“Haters gonna hate”: challenges for sentiment analysis of Facebook comments in Brazilian Portuguese

Contact Info

Product

Resources

About