Viviana Cotik scite author profile

Stricker

Vivaldi

et al. 2016

Identification of the certainty of events is an important text mining problem. In particular, biomedical texts report medical conditions or findings that might be factual, hedged or negated. Identification of negation and its scope over a term of interest determines whether a finding is reported and is a challenging task. Not much work has been performed for Spanish in this domain. In this work we introduce different algorithms developed to determine if a term of interest is under the scope of negation in radiology reports written in Spanish. The methods include syntactic techniques based in rules derived from PoS tagging patterns, constituent tree patterns and dependency tree patterns, and an adaption of NegEx, a well known rule-based negation detection algorithm (Chapman et al., 2001a). All methods outperform a simple dictionary lookup algorithm developed as baseline. NegEx and the PoS tagging pattern method obtain the best results with 0.92 F1.

A hybrid promoter analysis methodology for prokaryotic genomes

Romero-Záliz

Zwir

2005

Fuzzy Sets and Systems

Annotation of Entities and Relations in Spanish Radiology Reports

Filippo²,

Roller

et al. 2017

Radiology reports express the results of a radiology study and contain information about anatomical entities, findings, measures and impressions of the medical doctor. The use of information extraction techniques can help physicians to access this information in order to understand data and to infer further knowledge.Supervised machine learning methods are very popular to address information extraction, but are usually domain and language dependent. To train new classification models, annotated data is required. Moreover, annotated data is also required as an evaluation resource of information extraction algorithms. However, one major drawback of processing clinical data is the low availability of annotated datasets. For this reason we performed a manual annotation of radiology reports written in Spanish. This paper presents the corpus, the annotation schema, the annotation guidelines and further insight of the data.

Spanish Named Entity Recognition in the Biomedical Domain

Rodríguez

Vivaldi

2019

Named Entity Recognition in the clinical domain and in languages different from English has the difficulty of the absence of complete dictionaries, the informality of texts, the polysemy of terms, the lack of accordance in the boundaries of an entity, the scarcity of corpora and of other resources available. We present a Named Entity Recognition method for poorly resourced languages. The method was tested with Spanish radiology reports and compared with a conditional random fields system.

A Corpus for Outbreak Detection of Diseases Prevalent in Latin America

Dellanzo¹,

Luna³

2020

In this paper we present an annotated corpus which can be used for training and testing algorithms to automatically extract information about diseases outbreaks from news and health reports. We also propose initial approaches to extract information from it. The corpus has been constructed with two main tasks in mind. The first one, to extract entities about outbreaks such as disease, host, location among others. The second one, to retrieve relations among entities, for instance, in such geographic location fifteen cases of a given disease were reported. Overall, our goal is to offer resources and tools to perform an automated analysis so as to support early detection of disease outbreaks and therefore diminish their spreading.

Automatic Detection of Negated Findings with NooJ: First Results

Koza

Muñoz

Rivas

et al. 2018

Assessing the Impact of Contextual Information in Hate Speech Detection

Pérez

Luque²,

Zayat

et al. 2023

IEEE Access

In recent years, hate speech has gained relevance in social networks and other virtual media because of its intensity and its relationship with violent acts against members of protected groups. Due to the huge amounts of user-generated contents, a great effort has been made to develop automatic tools to aid the analysis and moderation of this speech, at least in its most threatening forms. One of the limitations of current approaches to automatic hate speech detection is the lack of context; most studies and resources focus on isolated messages, without considering any type of conversational context or even the topic being discussed. This severely restricts the available information for determining whether a post on a social network should be tagged as hateful or not. In this work, we assess the impact of adding contextual information to the hate speech detection task. In particular, we study a Twitter subdomain consisting of replies to posts by news outlets, which provides a natural environment for contextualized hate speech detection. We collected a novel corpus in the Rioplatense dialectal variety of Spanish focusing on hate speech associated with the COVID-19 pandemic, and manually annotated it using carefully designed guidelines. Our classification experiments using state-of-the-art transformer-based machine learning techniques show evidence that adding contextual information improves the performance of hate speech detection for two proposed tasks (binary and multilabel prediction), increasing their Macro F1 by 4.2 and 5.5 points, respectively. These results highlight the importance of exploiting contextual information for the task of hate speech detection. We make our code, models, and corpus available for further research.

Arabic medical entity tagging using distant learning in a Multilingual Framework

Journal of King Saud University - Computer and Information Scie

Rodrguez

Vivaldi

2017