Serguei Pakhomov scite author profile

Measures of semantic similarity between concepts are widely used in Natural Language Processing. In this article, we show how six existing domain-independent measures can be adapted to the biomedical domain. These measures were originally based on WordNet, an English lexical database of concepts and relations. In this research, we adapt these measures to the SNOMED-CT ontology of medical concepts. The measures include two path-based measures, and three measures that augment path-based measures with information content statistics from corpora. We also derive a context vector measure based on medical corpora that can be used as a measure of semantic relatedness. These six measures are evaluated against a newly created test bed of 30 medical concept pairs scored by three physicians and nine medical coders. We find that the medical coders and physicians differ in their ratings, and that the context vector measure correlates most closely with the physicians, while the path-based measures and one of the information content measures correlates most closely with the medical coders. We conclude that there is a role both for more flexible measures of relatedness based on information derived from corpora, as well as for measures that rely on existing ontological structures.

show abstract

CLAMP – a toolkit for efficiently building customized clinical natural language processing pipelines

Soysal

et al. 2017

View full text Add to dashboard Cite

Existing general clinical natural language processing (NLP) systems such as MetaMap and Clinical Text Analysis and Knowledge Extraction System have been successfully applied to information extraction from clinical text. However, end users often have to customize existing systems for their individual tasks, which can require substantial NLP skills. Here we present CLAMP (Clinical Language Annotation, Modeling, and Processing), a newly developed clinical NLP toolkit that provides not only state-of-the-art NLP components, but also a user-friendly graphic user interface that can help users quickly build customized NLP pipelines for their individual applications. Our evaluation shows that the CLAMP default pipeline achieved good performance on named entity recognition and concept encoding. We also demonstrate the efficiency of the CLAMP graphic user interface in building customized, high-performance NLP pipelines with 2 use cases, extracting smoking status and lab test values. CLAMP is publicly available for research use, and we believe it is a unique asset for the clinical NLP community.

show abstract

New Perspectives on the Aging Lexicon

Wulff

Deyne

Jones

et al. 2019

Trends in Cognitive Sciences

106

117

View full text Add to dashboard Cite

The field of cognitive aging has seen considerable advances in describing the linguistic and semantic changes that happen during the adult life span to uncover the structure of the mental lexicon (i.e., the mental repository of lexical and conceptual representations). Nevertheless, there is still debate concerning the sources of these changes, including the role of environmental exposure and several cognitive mechanisms associated with learning, representation, and retrieval of information. We review the current status of research in this field and outline a framework that promises to assess the contribution of both ecological and psychological aspects to the aging lexicon. Cognitive Aging and the Mental Lexicon There is consensus in the cognitive sciences that human development extends well beyond childhood and adolescence, and there has been remarkable empirical progress in the field of cognitive aging in past decades [1]. Nevertheless, the role of environmental and cognitive factors in age-related changes in the structure and processing of lexical and semantic representations (see Glossary) is still under debate. For example, age-related memory decline is commonly attributed to a decline in cognitive abilities [2,3], yet some researchers have proposed that massive exposure to language over the course of one's life leads to knowledge gains that may contribute to, if not fully account for, age-related memory deficits [4-6]. We argue that to resolve such debates we require an interdisciplinary approach that captures how information exposure across adulthood may change the way that we acquire, represent, and recall information. We summarize recent developments in the field (Figure 1, Table 1) and propose a conceptual framework (Figure 2, Key Figure) and associated research agenda that argues for combining ecological analyses, formal modeling, and large-scale empirical studies to shed light on the contents, structure, and neural basis of the aging mental lexicon in both health and disease. Mental Lexicon: Aging and Cognitive Performance The mental lexicon can be thought of as a repository of lexical and conceptual representations, composed of organized networks of semantic, phonological, orthographic, morphological, and other types of information [7]. The cognitive sciences have provided considerable knowledge about the computational (Box 1; [8-11]) and neural basis (Box 2; [12,13]) of lexical and semantic cognition, and there has been considerable interest in how such aspects of cognition change across adulthood and aging [14,15]. Past work on the aging lexicon emphasized the amount of information acquired across the life span (e.g., vocabulary gains across adulthood; [15]); however, new evaluations using graphbased approaches suggest that both quantity and structural aspects of representations differ between individuals [16] and change across the life span [17-19]. Such insights were gathered, for example, from a large-scale analysis of free association data from thousands of individuals [17], ranging from 10 to ...

show abstract

Corpus domain effects on distributional semantic modeling of medical terms

Pakhomov

Finley

McEwan

et al. 2016

View full text Add to dashboard Cite

Motivation: Automatically quantifying semantic similarity and relatedness between clinical terms is an important aspect of text mining from electronic health records, which are increasingly recognized as valuable sources of phenotypic information for clinical genomics and bioinformatics research. A key obstacle to development of semantic relatedness measures is the limited availability of large quantities of clinical text to researchers and developers outside of major medical centers. Text from general English and biomedical literature are freely available; however, their validity as a substitute for clinical domain to represent semantics of clinical terms remains to be demonstrated. Results: We constructed neural network representations of clinical terms found in a publicly available benchmark dataset manually labeled for semantic similarity and relatedness. Similarity and relatedness measures computed from text corpora in three domains (Clinical Notes, PubMed Central articles and Wikipedia) were compared using the benchmark as reference. We found that measures computed from full text of biomedical articles in PubMed Central repository (rho ¼ 0.62 for similarity and 0.58 for relatedness) are on par with measures computed from clinical reports (rho ¼ 0.60 for similarity and 0.57 for relatedness). We also evaluated the use of neural network based relatedness measures for query expansion in a clinical document retrieval task and a biomedical term word sense disambiguation task. We found that, with some limitations, biomedical articles may be used in lieu of clinical reports to represent the semantics of clinical terms and that distributional semantic methods are useful for clinical and biomedical natural language processing applications.

show abstract

Automating the Assignment of Diagnosis Codes to Patient Encounters Using Example-based and Machine Learning Techniques

Pakhomov

Buntrock

Chute

2006

Journal of the American Medical Informatics Association

120

View full text Add to dashboard Cite

show abstract

Paradoxical lucidity: A potential paradigm shift for the neurobiology and treatment of severe dementias

Mashour

Frank

Batthyány

et al. 2019

Alzheimer's & Dementia

View full text Add to dashboard Cite

Unexpected cognitive lucidity and communication in patients with severe dementias, especially around the time of death, have been observed and reported anecdotally. Here, we review what is known about this phenomenon, related phenomena that provide insight into potential mechanisms, ethical implications, and methodologic considerations for systematic investigation. We conclude that paradoxical lucidity, if systematically confirmed, challenges current assumptions and highlights the possibility of network‐level return of cognitive function in cases of severe dementias, which can provide insight into both underlying neurobiology and future therapeutic possibilities.

show abstract

Prospective recruitment of patients with congestive heart failure using an ad-hoc binary classifier

Pakhomov

Buntrock

Chute

2005

Journal of Biomedical Informatics

View full text Add to dashboard Cite

This paper addresses a very specific problem of identifying patients diagnosed with a specific condition for potential recruitment in a clinical trial or an epidemiological study. We present a simple machine learning method for identifying patients diagnosed with congestive heart failure and other related conditions by automatically classifying clinical notes dictated at Mayo Clinic. This method relies on an automatic classifier trained on comparable amounts of positive and negative samples of clinical notes previously categorized by human experts. The documents are represented as feature vectors, where features are a mix of demographic information as well as single words and concept mappings to MeSH and HICDA classification systems. We compare two simple and efficient classification algorithms (Naïve Bayes and Perceptron) and a baseline term spotting method with respect to their accuracy and recall on positive samples. Depending on the test set, we find that Naïve Bayes yields better recall on positive samples (95 vs. 86%) but worse accuracy than Perceptron (57 vs. 65%). Both algorithms perform better than the baseline with recall on positive samples of 71% and accuracy of 54%.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.