Corpus Linguistics Around the World 2006
DOI: 10.1163/9789401202213_002
|View full text |Cite
|
Sign up to set email alerts
|

Methodology and steps towards the construction of EPEC, a corpus of written Basque tagged at morphological and syntactic levels for automatic processing

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
8
0
3

Year Published

2009
2009
2017
2017

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 15 publications
(11 citation statements)
references
References 7 publications
0
8
0
3
Order By: Relevance
“…3 Annotated Corpus of Basque EPEC (Reference Corpus for the Processing of Basque) (Aduriz et al, 2006) is a 300,000 word sample collection of standard written Basque that has been manually annotated at different levels (morphology, surface syntax, phrases, etc.). The corpus is composed by news published in Euskaldunon Egunkaria, a Basque language newspaper.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…3 Annotated Corpus of Basque EPEC (Reference Corpus for the Processing of Basque) (Aduriz et al, 2006) is a 300,000 word sample collection of standard written Basque that has been manually annotated at different levels (morphology, surface syntax, phrases, etc.). The corpus is composed by news published in Euskaldunon Egunkaria, a Basque language newspaper.…”
Section: Related Workmentioning
confidence: 99%
“…For our experiments we use the EPEC corpus annotated for coreference (Aduriz et al, 2006) and we run experiments across two dimensions. First, we use a baseline model based on (Soon et al, 2001) vs. a model that includes extra features reliably extracted for Basque with the tools at hand.…”
Section: Introductionmentioning
confidence: 99%
“…The corpus used to carry out the error analysis is a part of EPEC (the Reference Corpus for the Processing of Basque) (Aduriz et al, 2006). EPEC is a 300,000 word sample collection of news published in Euskaldunon Egunkaria, a Basque language newspaper.…”
Section: Error Analysismentioning
confidence: 99%
“…Lehen euskarazko corpusa [3] CoNLL-X formatura egokitu zen, dependentzietan oinarritutako analizatzaile sintaktiko-sortzaileek erabili ahal izateko. Egokitzapenaren ondoren, 55.469 hitz eta 3.700 esaldikoa den CoNLL-X formatuko lehen euskarako zuhaitz-bankua lortu zen (hemendik aurrera EZB I zuhaitz-bankua izenaz ezagutuko dena).…”
Section: Euskararako Analizatzaile Sintaktiko-estatistikoa Hobetzeko unclassified