Izaskun Aldezabal scite author profile

Semantic interpretation of language requires extensive and rich lexical knowledge bases (LKB). The Basque WordNet is a LKB based on WordNet and its multilingual counterparts EuroWordNet and the Multilingual Central Repository. This paper reviews the theoretical and practical aspects of the Basque WordNet lexical knowledge base, as well as the steps and methodology followed in its construction. Our methodology is based on the joint development of wordnets and annotated corpora. The Basque WordNet contains 32,456 synsets and 26,565 lemmas, and is complemented by a hand-tagged corpus comprising 59,968 annotations.

Learning argument/adjunct distinction for Basque

Aranzabe

Gojenola

et al. 2002

This paper presents experiments performed on lexical knowledge acquisition in the form of verbal argumental information. The system obtains the data from raw corpora after the application of a partial parser and statistical filters. We used two different statistical filters to acquire the argumental information: Mutual Information, and Fisher's Exact test. Due to the characteristics of agglutinative languages like Basque, the usual classification of arguments in terms of their syntactic category (such as NP or PP) is not suitable. For that reason, the arguments will be classified in 48 different kinds of case markers, which makes the system fine grained if compared to equivalent systems that have been developed for other languages. This work addresses the problem of distinguishing arguments from adjuncts, this being one of the most significant sources of noise in subcategorization frame acquisition.

Syntactic annotation in the Reference Corpus for the Processing of Basque (EPEC): Theoretical and practical issues

Aldezabal¹,

Aranzabe²,

Arriola³

et al. 2009

A word-grammar based morphological analyzer for agglutinative languages

Aduriz

Agirre

et al. 2000

Agglutinative languages presenl rich morphology and for sonic applications they lleed deep analysis at word level. Tile work here presenled proposes a model for designing a full nlorphological analyzer. The model integrates lhe two-level fornlalisnl alld a ullificalion-I)asod fornialisni. In contrast to other works, we propose to separate the treatment of sequential and non-sequetTtial mou)holactic constraints. Sequential constraints are applied in lhe seglllenlalion phase, and non-seqtlontial OlleS ill the filial feature-combination phase. Early application of sequential nlorpholactic coilsli'aiills during tile segnloillaiioi/ process nlakes feasible :,ill officienl iinplenleilialion of tile full morphological analyzer. The result of lhis research has been tile design and imi)len~entation of a full nlorphosynlactic analysis procedure for each word in unrestricted Basque texts.

A methodology for the semiautomatic annotation of EPEC-RolSem, a Basque corpus labeled at predicate level following the PropBank-VerbNet model

Estarrona

Digital Scholarship Humanities

Ilarraza

et al. 2015

In this article we describe the methodology developed for the semiautomatic annotation of EPEC-RolSem, a Basque corpus labeled at predicate level that follows the PropBank-VerbNet model. The methodology presented is the product of detailed theoretical study of the semantic nature of verbs in Basque and of their similarities and differences with verbs in other languages. As part of the proposed methodology, we are creating a Basque lexicon on the PropBank-VerbNet model that we have named the Basque Verb Index (BVI). Our work thus dovetails with the general trend toward building lexicons from tagged corpora that is clear in work conducted for other languages. EPEC-RolSem and BVI are two important resources for the computational semantic processing of Basque; as far as the authors are aware, they are also the first resources of their kind developed for Basque. In addition, each entry in BVI is linked to the corresponding verb-entry in well-known resources like PropBank, VerbNet, WordNet, FrameNet, and Levin's classification. We have also implemented several automatic processes to aid in creating and annotating the BVI, including processes designed to facilitate the task of manual annotation.

Designing spelling correctors for inflected languages using lexical transducers

Alegria

Ansa

et al. 1999

How the corpus-based Basque Verb Index lexicon was built

Estarrona

Lang Resources & Evaluation

Ilarraza

2018

A Pilot Study of English Selectional Preferences and Their Cross-Lingual Compatibility with Basque

Agirre

Pociello

2003