This article describes the system submitted by the RGCL-WLV team to the SemEval 2019 Task 12: Toponym resolution in scientific papers. The system detects toponyms using a bootstrapped machine learning (ML) approach which classifies names identified using gazetteers extracted from the GeoNames geographical database. The paper evaluates the performance of several ML classifiers, as well as how the gazetteers influence the accuracy of the system. Several runs were submitted. The highest precision achieved for one of the submissions was 89%, albeit it at a relatively low recall of 49%.
Terminologies in the biomedical field are one of the main resources used in the clinical practice. Keeping them up-to-date to meet realworld use cases is a critical operation that even in the case of well maintained terminologies such as SNOMED-CT involves much effort from domain experts. Pharmacological products or drugs are constantly being approved and made available in the market and their clinical information should be also updated in terminologies. Each new drug is provided with its Summary of Product Characteristics (SPC), a document in natural language that contains its essential information. This paper proposes a method for populating the Spanish extension of SNOMED-CT with drug names using SPCs and representing their clinical data sections in the terminology. More precisely, the method has been applied to the therapeutic indication and the adverse reaction sections, in which disease names are recognized as named entities in the document and mapped to the terminology. The relations between the drug name and the mapped entities are also represented in the terminology based on the specific roles that they have in the document.
Domain-specific terminologies play a central role in many language technology solutions. Substantial manual effort is still involved in the creation of such resources, and many of them are published in proprietary formats that cannot be easily reused in other applications. Automatic term extraction tools help alleviate this cumbersome task. However, their results are usually in the form of plain lists of terms or as unstructured data with limited linguistic information. Initiatives such as the Linguistic Linked Open Data cloud (LLOD) foster the publication of language resources in open structured formats, specifically RDF, and their linking to other resources on the Web of Data. In order to leverage the wealth of linguistic data in the LLOD and speed up the creation of linked terminological resources, we propose TermitUp, a service that generates enriched domain specific terminologies directly from corpora, and publishes them in open and structured formats. TermitUp is composed of five modules performing terminology extraction, terminology post-processing, terminology enrichment, term relation validation and RDF publication. As part of the pipeline implemented by this service, existing resources in the LLOD are linked with the resulting terminologies, contributing in this way to the population of the LLOD cloud. TermitUp has been used in the framework of European projects tackling different fields, such as the legal domain, with promising results. Different alternatives on how to model enriched terminologies are considered and good practices illustrated with examples are proposed.
Named Entity Recognition (NER) poses new challenges in real-world documents in which there are entities with different roles according to their purpose or meaning. Retrieving all the possible entities in scenarios in which only a subset of them based on their role is needed, produces noise on the overall precision. This work proposes a NER model that relies on role classification models that support recognizing entities with a specific role. The proposed model has been implemented in two use cases using Spanish drug Summary of Product Characteristics: identification of therapeutic indications and identification of adverse reactions. The results show how precision is increased using a NER model that is oriented towards a specific role and discards entities out of scope.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.