Xabier Saralegi scite author profile

Xabier Saralegi

5Publications

34Citation Statements Received

86Citation Statements Given

How they've been cited

How they cite others

Affiliations

Publications

Order By: Most citations

EliXa: A Modular and Flexible ABSA Platform

Vicente¹,

Saralegi²,

Agerri

2015

View full text Add to dashboard Cite

This paper presents a supervised Aspect Based Sentiment Analysis (ABSA) system. Our aim is to develop a modular platform which allows to easily conduct experiments by replacing the modules or adding new features. We obtain the best result in the Opinion Target Extraction (OTE) task (slot 2) using an off-the-shelf sequence labeler. The target polarity classification (slot 3) is addressed by means of a multiclass SVM algorithm which includes lexical based features such as the polarity values obtained from domain and open polarity lexicons. The system obtains accuracies of 0.70 and 0.73 for the restaurant and laptop domain respectively, and performs second best in the out-of-domain hotel, achieving an accuracy of 0.80.

show abstract

Information retrieval and question answering: A case study on COVID-19 scientific literature

Otegi

Vicente²,

Saralegi³

et al. 2022

Knowledge-Based Systems

View full text Add to dashboard Cite

Biosanitary experts around the world are directing their efforts towards the study of COVID-19. This effort generates a large volume of scientific publications at a speed that makes the effective acquisition of new knowledge difficult. Therefore, Information Systems are needed to assist biosanitary experts in accessing, consulting and analyzing these publications. In this work we develop a study of the variables involved in the development of a Question Answering system that receives a set of questions asked by experts about the disease COVID-19 and its causal virus SARS-CoV-2, and provides a ranked list of expert-level answers to each question. In particular, we address the interrelation of the Information Retrieval and the Answer Extraction steps. We found that a recall based document retrieval that leaves to a neural answer extraction module the scanning of the whole documents to find the best answer is a better strategy than relying in a precise passage retrieval before extracting the answer span.

show abstract

Give your Text Representation Models some Love: the Case for Basque

Agerri¹,

Vicente²,

Campos³

et al. 2020

Preprint

View full text Add to dashboard Cite

Word embeddings and pre-trained language models allow to build rich representations of text and have enabled improvements across most NLP tasks. Unfortunately they are very expensive to train, and many small companies and research groups tend to use models that have been pre-trained and made available by third parties, rather than building their own. This is suboptimal as, for many languages, the models have been trained on smaller (or lower quality) corpora. In addition, monolingual pre-trained models for non-English languages are not always available. At best, models for those languages are included in multilingual versions, where each language shares the quota of substrings and parameters with the rest of the languages. This is particularly true for smaller languages such as Basque. In this paper we show that a number of monolingual models (FastText word embeddings, FLAIR and BERT language models) trained with larger Basque corpora produce much better results than publicly available versions in downstream NLP tasks, including topic classification, sentiment classification, PoS tagging and NER. This work sets a new state-of-the-art in those tasks for Basque. All benchmarks and models used in this work are publicly available.

show abstract

Comparing Different Approaches to Treat Translation Ambiguity in CLIR: Structured Queries vs. Target Co-occurrence Based Selection

Saralegi¹,

Lacalle²

2009

View full text Add to dashboard Cite

Two main problems in Cross-language Information Retrieval are translation selection and the treatment of out-ofvocabulary terms. In this paper, we will be focusing on the problem concerning the translation selection. Structured queries and target co-occurrence-based methods seem to be the most appropriate approaches when parallel corpora are not available. However, there is no comparative study. In this paper we compare the results obtained using each of the aforementioned methods, we specify the weaknesses of each method, and finally we propose a hybrid method to combine both. In terms of mean average precision, results for BasqueEnglish cross-lingual retrieval show that structured queries are the best approach both with long queries and short queries.

show abstract

Not Enough Data to Pre-train Your Language Model? MT to the Rescue!

Urbizu¹,

Vicente²,

Saralegi³

et al. 2023

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Xabier Saralegi

EliXa: A Modular and Flexible ABSA Platform

Information retrieval and question answering: A case study on COVID-19 scientific literature

Give your Text Representation Models some Love: the Case for Basque

Comparing Different Approaches to Treat Translation Ambiguity in CLIR: Structured Queries vs. Target Co-occurrence Based Selection

Not Enough Data to Pre-train Your Language Model? MT to the Rescue!

Contact Info

Product

Resources

About