2009
DOI: 10.1515/cllt.2009.010
|View full text |Cite
|
Sign up to set email alerts
|

Syntactic annotation in the Reference Corpus for the Processing of Basque (EPEC): Theoretical and practical issues

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0
1

Year Published

2015
2015
2019
2019

Publication Types

Select...
4
1

Relationship

1
4

Authors

Journals

citations
Cited by 5 publications
(6 citation statements)
references
References 0 publications
0
5
0
1
Order By: Relevance
“…On the other hand, an annotated text corpus is generated by way of attaching various tags of intralinguistic and extralinguistic information with the text samples (Leech, 2005;Berez & Gries, 2010). After annotation, a text corpus changes to a large extent in its character, content, form, and function (Aldebazal et al, 2009). The tags that are tagged to the texts usually increase the utility of a text corpus by way of providing specificity in text identification and accuracy in information extraction (O'Donnell, 1999).…”
Section: What Is Corpus Annotation?mentioning
confidence: 99%
“…On the other hand, an annotated text corpus is generated by way of attaching various tags of intralinguistic and extralinguistic information with the text samples (Leech, 2005;Berez & Gries, 2010). After annotation, a text corpus changes to a large extent in its character, content, form, and function (Aldebazal et al, 2009). The tags that are tagged to the texts usually increase the utility of a text corpus by way of providing specificity in text identification and accuracy in information extraction (O'Donnell, 1999).…”
Section: What Is Corpus Annotation?mentioning
confidence: 99%
“…With the growth of Unicode, however, the need for the SAMPA character set is obviated, although major corpora/resources like CELEX still use it. 2 Phonemic annotation is possible to generate automatically from orthographic transcription via a pronunciation lexicon and/or rule-based algorithms. Fine phonetic transcription, on the other hand, makes use of an extended set of characters including diacritics, and usually requires hand-coding by humans.…”
Section: Phonetic and Phonological Annotationmentioning
confidence: 99%
“…10 for the Reference Corpus for the Processing of Basque (EPEC; cf. [2]), a 300 K word corpus of written Basque annotated morphologically (for part-of-speech, number, definiteness, and case), lexically (for named entities, multi-word units), and syntactically in a Dependency-Grammar format.…”
Section: Parsed Corpora (Inline/embedded)mentioning
confidence: 99%
“…Lehen bertsio horrekin, CoNLL 2007 Multilingual Dependency Parsing lehiaketan, beste 9 hizkuntzarekin batera, eta 20 sistemaren artean ebaluatu zuten. 2010ean, dependentzia-ereduan etiketatuta dagoen euskarazko EPEC-DEP zuhaitzbankua [4] CoNLL-X formatura egokitu zen, dependentzietan oinarritutako analizatzaile sintaktiko-sortzaileek erabili ahal izateko. CoNLL-X formatura moldatu ondoren, 150.000 hitz eta 11.225 esaldikoa den euskarazko z uhaitzbankua lortu zen (hemendik aurrera EZB II zuhaitz-bankua izenaz ezagutuko dena).…”
Section: Euskararako Analizatzaile Sintaktiko-estatistikoa Hobetzeko unclassified