ITEXT-BIO: Intelligent Term EXTraction for BIOmedical analysis

Kafando, Rodrique; Decoupes, Rémy; Valentin, Sarah; Sautot, Lucile; Teisseire, Maguelonne; Roche, Mathieu

doi:10.1007/s13755-021-00156-6

Cited by 5 publications

(5 citation statements)

References 29 publications

(46 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…According to the review, one of the most commonly used algorithms is C-Value, which handles a hybrid approach and uses the frequency of occurrence and nesting of candidate terms to determine the relevant ones depending on the domain. The authors agree that the algorithm's potential lies in its hybrid characteristic because one part is in charge of applying statistical methods, and subsequently, linguistic filters are applied (Lahbib et al 2015;Bakar et al 2015;Ali and Saad 2016;Du et al 2016;Benabdallah et al 2017;Arora et al 2017;Mykowiecka et al 2018;Kafando et al 2021).…”

Section: What Algorithms Are Used In Automatic Term Extractors?mentioning

confidence: 97%

“…On the other hand, the Term Frequency-Inverse Document Frequency (TF-IDF) algorithm applies weighting methods to express the level of relevance of a candidate term within a text. It is an algorithm used for term extraction, and although there are many systems with different variations, the basis is still TF-IDF (Guo et al 2015;Ali and Saad 2016;Abduljabbar et al 2018;Afrizal et al 2019;Kafando et al 2021).…”

Section: What Algorithms Are Used In Automatic Term Extractors?mentioning

confidence: 99%

See 1 more Smart Citation

Approaches, tools, algorithms, and methods for automatic term extraction: A systematic literature mapping

Andrade

Otálvaro

Jaramillo

et al. 2023

Preprint

View full text Add to dashboard Cite

Automatic term extraction is a branch of Natural Language Processing (NLP) used to automatically generate lexicographic materials, such as glossaries, vocabularies, and dictionaries. It allows the creation of standard bases for building unified theories and translations between languages. Scientific literature shows great interest in the construction of automatic term extractors and includes several approaches, tools, algorithms, and methods that can be used for their construction; however, the number of articles in specialized databases is vast, and literature reviews are not recent. This paper presents a systematic literature mapping of the existing material for developing automatic term extractors to provide an overview of approaches, tools, algorithms, and methods used to create them. For this purpose, scientific articles in the domain published between 2015 and 2022 are reviewed and categorized. The mapping results show that among the most used approaches are statistical, with 21.85%; linguistic, with 9.75%; and hybrid, with 68.29%. In addition, there are various computational tools for terminology extraction where authors use different methods for their construction and whose results are measured under the criteria of precision and recall. Finally, 113 documents were selected to answer the research questions and to demonstrate how automatic term extractors are constructed. This paper presents a global summary of primary studies as an essential tool to approach this type of computational system construction.

show abstract

Section: What Algorithms Are Used In Automatic Term Extractors?mentioning

confidence: 97%

Section: What Algorithms Are Used In Automatic Term Extractors?mentioning

confidence: 99%

Approaches, tools, algorithms, and methods for automatic term extraction: A systematic literature mapping

Andrade

Otálvaro

Jaramillo

et al. 2023

Preprint

View full text Add to dashboard Cite

show abstract

“…While many strategies for identifying MWEs have been presented in the past (Ramisch et al, 2010;Kafando et al, 2021;, we found that applying them to the medical domain (and especially its clinical counterpart) was challenging due to the extreme corpus size that would be required to produce statistically significant results for the long tail of medical entities.…”

Section: Introductionmentioning

confidence: 99%

Proceedings of the 19th Workshop on Multiword Expressions (MWE 2023)

2023

View full text Add to dashboard Cite

Lexical collocations: Explored a lot, still a lot more to explore Lexical collocations, i.e., idiosyncratic binary lexical item combinations, have been an active research topic already for a number of years. State-of-the-art neural network models report to detect and classify specific types of lexical collocations with high accuracy, which might suggest that the problem has been solved. However, a cross-type and cross-language analysis of the results of one of these models raises several relevant research questions. In the first part of my talk, I will present our recent work on the identification and classification of lexical collocations with respect to the fine-grained taxonomy of lexical functions (LFs) in English, French, Spanish and Japanese. Drawing on the outcome of this work, I will focus, in the second part of my talk, on the comparative analysis of the "LF profiles" of English and Japanese material. In particular, I will discuss (i) how the considered LFs are distributed in the given corpora; (ii) how rich the repertoires of the LF instances are in each of them; (iii) whether the contexts of the LF instances overlap; and (iv) to what extent the "profile" of an LF correlates with the accuracy of the recognition of its instances. To conclude, I will formulate the research questions that arise from this analysis.

show abstract

“…While many strategies for identifying MWEs have been presented in the past (Ramisch et al, 2010;Kafando et al, 2021;Zeng and Bhat, 2021), we found that applying them to the medical domain (and especially its clinical counterpart) was challenging due to the extreme corpus size that would be required to produce statistically significant results for the long tail of medical entities.…”

Section: Introductionmentioning

confidence: 99%

Detecting Idiomatic Multiword Expressions in Clinical Terminology using Definition-Based Representation Learning

Remy,

Khabibullina,

Demeester

2023

Proceedings of the 19th Workshop on Multiword Expressions (MWE 2023)

View full text Add to dashboard Cite

This paper shines a light on the potential of definition-based semantic models for detecting idiomatic and semi-idiomatic multiword expressions (MWEs) in clinical terminology. Our study focuses on biomedical entities defined in the UMLS ontology and aims to help prioritize the translation efforts of these entities. In particular, we develop an effective tool for scoring the idiomaticity of biomedical MWEs based on the degree of similarity between the semantic representations of those MWEs and a weighted average of the representation of their constituents. We achieve this using a biomedical language model trained to produce similar representations for entity names and their definitions, called BioLORD. The importance of this definition-based approach is highlighted by comparing the BioLORD model to two other state-of-the-art biomedical language models based on Transformer: SapBERT and CODER. Our results show that the BioLORD model has a strong ability to identify idiomatic MWEs, not replicated in other models. Our corpus-free idiomaticity estimation helps ontology translators to focus on more challenging MWEs.

show abstract

ITEXT-BIO: Intelligent Term EXTraction for BIOmedical analysis

Cited by 5 publications

References 29 publications

Approaches, tools, algorithms, and methods for automatic term extraction: A systematic literature mapping

Approaches, tools, algorithms, and methods for automatic term extraction: A systematic literature mapping

Proceedings of the 19th Workshop on Multiword Expressions (MWE 2023)

Detecting Idiomatic Multiword Expressions in Clinical Terminology using Definition-Based Representation Learning

Contact Info

Product

Resources

About