2009
DOI: 10.1007/s10579-009-9108-x
|View full text |Cite
|
Sign up to set email alerts
|

AnCora-CO: Coreferentially annotated corpora for Spanish and Catalan

Abstract: This article describes the enrichment of the AnCora corpora of Spanish and Catalan (400 k each) with coreference links between pronouns (including elliptical subjects and clitics), full noun phrases (including proper nouns), and discourse segments. The coding scheme distinguishes between identity links, predicative relations, and discourse deixis. Inter-annotator agreement on the link types is 85-89% above chance, and we provide an analysis of the sources of disagreement. The resulting corpora make it possible… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
34
0

Year Published

2009
2009
2016
2016

Publication Types

Select...
5
3

Relationship

2
6

Authors

Journals

citations
Cited by 51 publications
(46 citation statements)
references
References 28 publications
0
34
0
Order By: Relevance
“…This is illustrated with examples extracted/adapted from different sources (Dras 1999;Doddington et al 2004;Dolan, Brockett, and Quirk 2005;Recasens and Martí 2010;Vila et al 2010) and our own. Apart from providing a better understanding of these tasks, we point out ways in which they can mutually benefit, which can shed light on future research.…”
Section: Introductionmentioning
confidence: 86%
“…This is illustrated with examples extracted/adapted from different sources (Dras 1999;Doddington et al 2004;Dolan, Brockett, and Quirk 2005;Recasens and Martí 2010;Vila et al 2010) and our own. Apart from providing a better understanding of these tasks, we point out ways in which they can mutually benefit, which can shed light on future research.…”
Section: Introductionmentioning
confidence: 86%
“…Besides the NXT-format Switchboard Corpora, there are a number of treebanks annotated with coreference in multiple languages, e.g., the Tübingen (Tüba D/Z) Treebank of German news text (Hinrichs et al, 2004), the NAIST Text Corpus of Japanese news text (Iida et al, 2007), and the AnCora-CO Corpus of Spanish and Catalan news text (Recasens and Marti, 2010). In the SemEval-2010 Shared Task "Coreference Resolution in Multiple Languages" , some of these resources were used when the coreference task was extended to cover Catalan and Spanish (the AnCora-CO corpora), German (TüBa-D/Z), Dutch (KNACK (Hoste, 2005)), and Italian (LiveMemories (Rodriguez et al, 2010)).…”
Section: Corpora Annotated With Coreferencementioning
confidence: 99%
“…Through metonymy, a set of associations is transfered that may be important to the interpretation of the utterance. Following Recasens and Marti (2010), we argue that NPs with different semantic references can pragmatically corefer within a discourse through metonymy, and that this permits the annotation of coreference links in such cases. Frequent examples are the use of the name of a country, the capital of a country, or the building that is the seat of government to mean the government of that country, or the use of a name of a city to refer to a sports team.…”
Section: Metonymymentioning
confidence: 99%
“…Notable exceptions are Botley (2006), Hedberg et al (2007), Navarretta and Olsen (2008), , who distinguish between semantic types such as events, processes, states (eventualities) and facts, propositions (factualities). Recasens and Martí (2009) use a different classification; they define the types token, type, proposition. Poesio and Artstein (2008) annotate selected semantic properties such as person, animate, concrete, space, time, etc.…”
Section: Related Workmentioning
confidence: 99%