The automatic processing of clinical documents, such as Electronic Health Records (EHRs), could benefit substantially from the enrichment of medical terminologies with terms encountered in clinical practice. To integrate such terms into existing knowledge sources, they must be linked to corresponding concepts. We present a method for the semantic categorization of clinical terms based on their surface form. We find that features based on sublanguage properties can provide valuable cues for the classification of term variants. 2 https://uts.nlm.nih.gov/home.html
This paper addresses the dynamics of terms and meaning in specialised communication by means of a semantic investigation into the domain of machining terminology in French. Studying meaning in specialised language raises two main research questions: how to identify terms or specialised entities in a technical corpus and how to study their meaning. In order to answer these questions, a double quantitative analysis is conducted, consisting of the identification and quantification of specialised vocabulary as well as the quantification of the semantic analysis by means of a monosemy measure. This approach requires the use of computer tools and scripting language and involves a statistical analysis in order to come to linguistic conclusions. Accordingly, this study aims to question the univocity ideal in a quantitative way. It focuses on the methodology and shows that an interdisciplinary approach can yield valuable results.
This paper explores two tools and methods for keyword extraction. As several tools are available, it makes a comparison of two widely used tools, namely Lexico3 (Lamalle et al. 2003) and WordSmith Tools (Scott 2013). It shows the importance of keywords and discusses recent studies involving keyword extraction. Since no previous study has attempted to compare two different tools, used by different language communities and which use different methodologies to extract keywords, this paper aims at filling the gap by comparing not only the tools and their practical use, but also the underlying methodologies and statistics. By means of a comparative study on a small test corpus, this paper shows major similarities and differences between the tools. The similarities mainly concern the most typical keywords, whereas the differences concern the total number of significant keywords extracted, the granularity of both probability value and typicality coefficient and the type of the reference corpus.
Dans le domaine du Traitement Automatique des Langues Naturelles (TALN), les modèles sémantiques distributionnels sont les piliers pour modéliser la sémantique lexicale à grande échelle (Turney & Pantel 2010). Ils sont fondés sur le calcul de la proximité sémantique entre mots sur la base des contextes partagés. La modélisation distributionnelle permet aux linguistes d’appuyer leurs analyses sur de grandes quantités de données authentiques et d’élargir la base empirique, pour ainsi détecter des motifs sémantiques intéressants (Geeraerts, 2010 : 165-178). Dans cet article, nous procédons à une présentation non technique de la modélisation sémantique distributionnelle et nous discutons une application lexicologique pour l’analyse de la polysémie. Finalement, nous présentons une méthode de visualisation qui permet aux experts humains d’interpréter les structures sémantiques cernées par les modèles distributionnels.
L’objectif du présent article est de montrer comment les descriptions lexicographiques traditionnelles peuvent être enrichies à partir des nouvelles techniques d’analyse et d’exploitation de corpus. Nous étudions des verbes dénotant la notion de hausse, en anglais, en français et en néerlandais, et à cet effet, nous procédons à des analyses de corpus parallèles et de corpus monolingues ciblés. Les corpus parallèles fournissent des indications sur la fréquence d’emploi et sur l’équivalence des traductions. Ces données quantitatives sont soumises à des analyses MDS (MultiDimensional Scaling ou positionnement multidimensionnel) afin d’établir les profils de traduction des verbes. Les corpus monolingues ciblés permettent d’affiner ces informations et de relever les collocatifs pertinents, afin de montrer les propriétés combinatoires des verbes. Les résultats des différentes analyses de corpus, en termes de profils de traduction et de profils combinatoires, contiennent des indications précieuses pour enrichir les descriptions lexicographiques traditionnelles des dictionnaires de traduction. La méthodologie et les résultats des analyses de corpus, ainsi que les défis pour la lexicographie, seront exposés.This article shows how new approaches in corpus analysis could enrich traditional lexicographic descriptions. We examine a set of trends verbs (i.e, denoting an increase), in English, French and Dutch, building on several analyses of parallel corpora and well-targeted monolingual corpora. Parallel corpora give information about the frequency and equivalence of translations. MDS (MultiDimensional Scaling) analyses on these quantitative data yield interesting results, in terms of translation profiles. Corpora in the target language allow us to refine these results and to extract salient collocates, and they show the combinatorial properties of trends verbs. The results of all these corpus analyses, by means of translation profiles and lexical profiles, can be used to enrich traditional lexicographic descriptions in translation dictionaries. The methodology and results of the corpus analyses, as well as some challenges for lexicography, will be presented
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.