Maxux Aranzabe scite author profile

This paper presents experiments performed on lexical knowledge acquisition in the form of verbal argumental information. The system obtains the data from raw corpora after the application of a partial parser and statistical filters. We used two different statistical filters to acquire the argumental information: Mutual Information, and Fisher's Exact test. Due to the characteristics of agglutinative languages like Basque, the usual classification of arguments in terms of their syntactic category (such as NP or PP) is not suitable. For that reason, the arguments will be classified in 48 different kinds of case markers, which makes the system fine grained if compared to equivalent systems that have been developed for other languages. This work addresses the problem of distinguishing arguments from adjuncts, this being one of the most significant sources of noise in subcategorization frame acquisition.

show abstract

A Cascaded Syntactic Analyser for Basque

Aduriz

Aranzabe

Arriola

et al. 2004

View full text Add to dashboard Cite

Abstract. This article presents a robust syntactic analyser for Basque and the different modules it contains. Each module is structured in different analysis layers for which each layer takes the information provided by the previous layer as its input; thus creating a gradually deeper syntactic analysis in cascade. This analysis is carried out using the Constraint Grammar (CG) formalism. Moreover, the article describes the standardisation process of the parsing formats using XML.

show abstract

Detecting Apposition for Text Simplification in Basque

Gonzalez-Dios

Aranzabe

Ilarraza

et al. 2013

View full text Add to dashboard Cite

Syntactic annotation in the Reference Corpus for the Processing of Basque (EPEC): Theoretical and practical issues

Aldezabal¹,

Aranzabe²,

Arriola³

et al. 2009

View full text Add to dashboard Cite

Making Biographical Data in Wikipedia Readable: A Pattern-based Multilingual Approach

Gonzalez-Dios

Aranzabe²,

Ilarraza³

2014

View full text Add to dashboard Cite

In this paper we present Biografix, a pattern based tool that simplifies parenthetical structures with biographical information, whose aim is to create simple, readable and accessible sentences. To that end, we analysed the parenthetical structures that appear in the first paragraph of the Basque Wikipedia, and concentrated on biographies. Although it has been designed and developed for Basque we adapted it and evaluated with other five languages. We also perform an extrinsic evaluation with a question generation system to see if Biografix improve its results.

show abstract

A methodology for the semiautomatic annotation of EPEC-RolSem, a Basque corpus labeled at predicate level following the PropBank-VerbNet model

Estarrona

Aldezabal

Ilarraza

et al. 2015

Digital Scholarship Humanities

View full text Add to dashboard Cite

In this article we describe the methodology developed for the semiautomatic annotation of EPEC-RolSem, a Basque corpus labeled at predicate level that follows the PropBank-VerbNet model. The methodology presented is the product of detailed theoretical study of the semantic nature of verbs in Basque and of their similarities and differences with verbs in other languages. As part of the proposed methodology, we are creating a Basque lexicon on the PropBank-VerbNet model that we have named the Basque Verb Index (BVI). Our work thus dovetails with the general trend toward building lexicons from tagged corpora that is clear in work conducted for other languages. EPEC-RolSem and BVI are two important resources for the computational semantic processing of Basque; as far as the authors are aware, they are also the first resources of their kind developed for Basque. In addition, each entry in BVI is linked to the corresponding verb-entry in well-known resources like PropBank, VerbNet, WordNet, FrameNet, and Levin's classification. We have also implemented several automatic processes to aid in creating and annotating the BVI, including processes designed to facilitate the task of manual annotation.

show abstract

Dependentzia Unibertsalen eredura egokitutako euskarazko zuhaitz-bankua

Aranzabe

Atutxa

Bengoetxea

et al. 2019

EKAIA

View full text Add to dashboard Cite

Hizkuntzaren Prozesamenduan kokatzen den Dependentzia Unibertsalen proiektuaren helburua da hainbat hizkuntzatan sortu diren dependentzia-ereduan oinarritutako zuhaitz-bankuak etiketatze-eskema estandar berera egokitzea. Artikulu honetan, eredu horretara automatikoki egokitu den euskarazko zuhaitz-bankua aurkezten da; halaber, egokitzapen-lan hori nola gauzatu den deskribatzen da eta, azkenik, horretan oinarrituta, azaltzen da zer antzekotasun eta zer desberdintasun diren jatorrizko zuhaitza-bankuaren eta Dependentzia Unibertsalen eredura egokitutako zuhaitz-bankuaren artean.

show abstract

12 3

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Maxux Aranzabe

Methodology and steps towards the construction of EPEC, a corpus of written Basque tagged at morphological and syntactic levels for automatic processing

Learning argument/adjunct distinction for Basque

A Cascaded Syntactic Analyser for Basque

Detecting Apposition for Text Simplification in Basque

Syntactic annotation in the Reference Corpus for the Processing of Basque (EPEC): Theoretical and practical issues

Making Biographical Data in Wikipedia Readable: A Pattern-based Multilingual Approach

A methodology for the semiautomatic annotation of EPEC-RolSem, a Basque corpus labeled at predicate level following the PropBank-VerbNet model

Dependentzia Unibertsalen eredura egokitutako euskarazko zuhaitz-bankua

Contact Info

Product

Resources

About