Abstract. This article presents a robust syntactic analyser for Basque and the different modules it contains. Each module is structured in different analysis layers for which each layer takes the information provided by the previous layer as its input; thus creating a gradually deeper syntactic analysis in cascade. This analysis is carried out using the Constraint Grammar (CG) formalism. Moreover, the article describes the standardisation process of the parsing formats using XML.
In this paper we present the coreferential tagging of part of the EPEC Corpus of Basque. Although coreference is a pragmatic linguistic phenomenon highly dependent on the situational context, it shows some language-specific patterns that vary according to the features of each language. Due to the fact that Basque is not an Indo-European language, it differs considerably in grammar from the languages spoken in surrounding areas. We will explain these features and the decisions made in each case. After describing the criteria defined for coreferential tagging in Basque, the annotation process will be explained. Our annotation is based on a morphologically and syntactically annotated corpus that provides us with a manageable environment, in which the specific structures that are part of a reference chain can be more easily identified. A part of the corpus was tagged by two annotators who marked up the same text independently, and by another annotator that acted as judge, solving problems in case of disagreement. All this process has been automatized as a result of previous studies carried out in this field. The automatic detection of mentions (Soraluze et al., in: Proceedings of Konvens, 2012) has provided us with a better working environment, and given us the possibility to build a first significant corpus for a later computational treatment of automatic coreferential resolution.
This paper presents a method to improve the translation of Verb-Noun Combinations (VNCs) in a rule-based Machine Translation (MT) system for SpanishBasque. Linguistic information about a set of VNCs is gathered from the public database Konbitzul, and it is integrated into the MT system, leading to an improvement in BLEU, NIST and TER scores, as well as the results being significantly better according to human evaluators.
Lanak adituek berrikusi dituzte, itsu bikoitzeko sistemaren bidez/Los trabajos han sido revisados por pares doble ciego.
Diseinua eta maketazioa/Diseño y maquetación:Kö estudio Imprimaketa/Impresión: Linegrafic
Agglutinative languages presenl rich morphology and for sonic applications they lleed deep analysis at word level. Tile work here presenled proposes a model for designing a full nlorphological analyzer. The model integrates lhe two-level fornlalisnl alld a ullificalion-I)asod fornialisni. In contrast to other works, we propose to separate the treatment of sequential and non-sequetTtial mou)holactic constraints. Sequential constraints are applied in lhe seglllenlalion phase, and non-seqtlontial OlleS ill the filial feature-combination phase. Early application of sequential nlorpholactic coilsli'aiills during tile segnloillaiioi/ process nlakes feasible :,ill officienl iinplenleilialion of tile full morphological analyzer. The result of lhis research has been tile design and imi)len~entation of a full nlorphosynlactic analysis procedure for each word in unrestricted Basque texts.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.