Proceedings of the 18th BioNLP Workshop and Shared Task 2019
DOI: 10.18653/v1/w19-5017
|View full text |Cite
|
Sign up to set email alerts
|

First Steps towards Building a Medical Lexicon for Spanish with Linguistic and Semantic Information

Abstract: We report the work-in-progress of collecting MedLexSp, an unified medical lexicon for the Spanish language, featuring terms and inflected word forms mapped to Unified Medical Language System (UMLS) Concept Unique Identifiers (CUIs), semantic types and groups. First, we leveraged a list of term lemmas and forms from a previous project, and mapped them to UMLS terms and CUIs. To enrich the lexicon, we used both domain-corpora (e.g. Summaries of Product Characteristics and MedlinePlus) and natural language proces… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0
1

Year Published

2020
2020
2023
2023

Publication Types

Select...
6
1

Relationship

1
6

Authors

Journals

citations
Cited by 9 publications
(5 citation statements)
references
References 40 publications
0
4
0
1
Order By: Relevance
“…In particular, for the clinical domain, several authors have explored combinations of general language, biomedical literature, and clinical corpora [ 32 – 34 ]. For the clinical Spanish language there is a lack of language resources, with the few corpora coming predominantly from Spain [ 18 , 35 , 36 ]. In the work presented here, we extrinsically tested the Spanish Billion Word Corpus Embeddings (computed over 2,024,959,560 tokens) [ 37 ], the Chilean Biomedical corpus (computed over 67,246,025 tokens) [ 38 ], and the general dataset described earlier (computed over 56,079,828 tokens), with the latest showing the best classification performance (see “ Results ” section).…”
Section: Methodsmentioning
confidence: 99%
“…In particular, for the clinical domain, several authors have explored combinations of general language, biomedical literature, and clinical corpora [ 32 – 34 ]. For the clinical Spanish language there is a lack of language resources, with the few corpora coming predominantly from Spain [ 18 , 35 , 36 ]. In the work presented here, we extrinsically tested the Spanish Billion Word Corpus Embeddings (computed over 2,024,959,560 tokens) [ 37 ], the Chilean Biomedical corpus (computed over 67,246,025 tokens) [ 38 ], and the general dataset described earlier (computed over 56,079,828 tokens), with the latest showing the best classification performance (see “ Results ” section).…”
Section: Methodsmentioning
confidence: 99%
“…Few authors have researched this issue and achieved good results. For example, Llanos LC (2019) [ 62 ] used hybrid NLP methods such as string distance methods or the generation of syntactic variants of standard thesauri in the Spanish language. The author expanded vocabulary coverage by gathering missing terms from various medical resources and matching term variants to missing items available in Spanish versions of the standard thesauri.…”
Section: Challenges Identified and Scope Of Work In Healthcare Using ...mentioning
confidence: 99%
“…Actualmente, contamos con una fracción de 2.000 interconsultas médicas anotadas en el Corpus de Lista de Espera Chilena, cuyas entidades médicas fueron normalizadas de forma automatizada empleando el léxico MedLexSp 38 , 39 asignándole uno o múltiples códigos únicos de identificación a cada entidad. Este recurso estará disponible para su libre uso próximamente.…”
Section: Codificación Automáticaunclassified