Writing habits and telltale neighbors: analyzing clinical concept usage patterns with sublanguage embeddings

Newman-Griffis, Denis; Fosler‐Lussier, Eric

doi:10.18653/v1/d19-6218

Cited by 4 publications

(6 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Table 5). The best systems peak at 85% F 1 score for Advice (a distance of more than 13 percentage points to the top recognition results for medication-attributes), they slip to 78% 13 and 77% for Mechanism and Effect, respectively, and plummet to 59% for Interaction 14 . Differences between the first 13 Xu et al [86] even reach slightly more than 79% F 1 score for Mechanism (using UMLS-based concept embeddings with a Bi-LSTM approach), but substantially fall below the results for the other three relation types in comparison with all the systems mentioned in Table 6.…”

Section: Drug-drug Interactionmentioning

confidence: 94%

“…the seminal descriptive work distinguishing both these sublanguage types by Friedman et al [13]). Newman-Griffis and Fosler-Lussier [14] investigated different sublanguage patterns for the many varieties of clinical reports (pathology reports, discharge summaries, nurse and Intensive Care Unit notes, etc. ), while Nunez and Carenini [15] discussed the portability of embeddings across various fields of medicine reflecting characteristic sublanguage use patterns.…”

Section: (Medical Information Martmentioning

confidence: 99%

“…Differences between the first 13 Xu et al [86] even reach slightly more than 79% F 1 score for Mechanism (using UMLS-based concept embeddings with a Bi-LSTM approach), but substantially fall below the results for the other three relation types in comparison with all the systems mentioned in Table 6. 14 Dewi et al [87] and Sun et al [88] report on 86.3% and 84.5% F 1 scores, respectively, for the overall relation classification task both using a multi-layered CNN archi-and second-ranked systems are typically small, yet become larger on subsequent ranks (roughly between 3 to 4 percentage points relative to the top-ranked system).…”

Section: Drug-drug Interactionmentioning

confidence: 99%

See 2 more Smart Citations

Medical Information Extraction in the Age of Deep Learning

Hahn

Oleynik

2020

Yearb Med Inform

View full text Add to dashboard Cite

Objectives: We survey recent developments in medical Information Extraction (IE) as reported in the literature from the past three years. Our focus is on the fundamental methodological paradigm shift from standard Machine Learning (ML) techniques to Deep Neural Networks (DNNs). We describe applications of this new paradigm concentrating on two basic IE tasks, named entity recognition and relation extraction, for two selected semantic classes—diseases and drugs (or medications)—and relations between them. Methods: For the time period from 2017 to early 2020, we searched for relevant publications from three major scientific communities: medicine and medical informatics, natural language processing, as well as neural networks and artificial intelligence. Results: In the past decade, the field of Natural Language Processing (NLP) has undergone a profound methodological shift from symbolic to distributed representations based on the paradigm of Deep Learning (DL). Meanwhile, this trend is, although with some delay, also reflected in the medical NLP community. In the reporting period, overwhelming experimental evidence has been gathered, as illustrated in this survey for medical IE, that DL-based approaches outperform non-DL ones by often large margins. Still, small-sized and access-limited corpora create intrinsic problems for data-greedy DL as do special linguistic phenomena of medical sublanguages that have to be overcome by adaptive learning strategies. Conclusions: The paradigm shift from (feature-engineered) ML to DNNs changes the fundamental methodological rules of the game for medical NLP. This change is by no means restricted to medical IE but should also deeply influence other areas of medical informatics, either NLP- or non-NLP-based.

show abstract

Section: Drug-drug Interactionmentioning

confidence: 94%

Section: (Medical Information Martmentioning

confidence: 99%

Section: Drug-drug Interactionmentioning

confidence: 99%

See 1 more Smart Citation

Medical Information Extraction in the Age of Deep Learning

Hahn

Oleynik

2020

Yearb Med Inform

View full text Add to dashboard Cite

show abstract

“…In addition, there is significant research into strategies for learning neural representations of entities in knowledge bases and coding systems. Past work has investigated diverse approaches, such as leveraging rich semantic information from knowledge base structure and web-scale annotated corpora (34,97,98), utilizing definitions of word senses (similar to our use of ICF definitions) (99,100), and combining terminologies with targeted selection of training corpora to learn applicationtailored concept representations (101,102). While most of the research on entity representations requires resources not yet available for FSI (e.g., large, annotated corpora; well-developed terminologies; robust and interconnected knowledge graph structure), all present significant opportunities to advance FSI coding technologies as more resources are developed.…”

Section: Alternative Coding Approachesmentioning

confidence: 99%

Automated Coding of Under-Studied Medical Concept Domains: Linking Physical Activity Reports to the International Classification of Functioning, Disability, and Health

Newman-Griffis¹,

Fosler‐Lussier²

2021

Front. Digit. Health

Self Cite

View full text Add to dashboard Cite

Linking clinical narratives to standardized vocabularies and coding systems is a key component of unlocking the information in medical text for analysis. However, many domains of medical concepts, such as functional outcomes and social determinants of health, lack well-developed terminologies that can support effective coding of medical text. We present a framework for developing natural language processing (NLP) technologies for automated coding of medical information in under-studied domains, and demonstrate its applicability through a case study on physical mobility function. Mobility function is a component of many health measures, from post-acute care and surgical outcomes to chronic frailty and disability, and is represented as one domain of human activity in the International Classification of Functioning, Disability, and Health (ICF). However, mobility and other types of functional activity remain under-studied in the medical informatics literature, and neither the ICF nor commonly-used medical terminologies capture functional status terminology in practice. We investigated two data-driven paradigms, classification and candidate selection, to link narrative observations of mobility status to standardized ICF codes, using a dataset of clinical narratives from physical therapy encounters. Recent advances in language modeling and word embedding were used as features for established machine learning models and a novel deep learning approach, achieving a macro-averaged F-1 score of 84% on linking mobility activity reports to ICF codes. Both classification and candidate selection approaches present distinct strengths for automated coding in under-studied domains, and we highlight that the combination of (i) a small annotated data set; (ii) expert definitions of codes of interest; and (iii) a representative text corpus is sufficient to produce high-performing automated coding systems. This research has implications for continued development of language technologies to analyze functional status information, and the ongoing growth of NLP tools for a variety of specialized applications in clinical care and research.

show abstract

“…However, quantitative, vector-based comparison of embedding spaces faces significant conceptual challenges, such as a lack of appropriate alignment objectives and empirical instability (Gonen et al, 2020). While nearest neighbor-based change measurement has been proposed (Newman-Griffis and Fosler-Lussier, 2019;Gonen et al, 2020), its efficacy for small corpora with limited vocabularies remains to be determined. Our novel embedding confidence measure offers a step in this direction (see §6.3 for further discussion), but further research is needed.…”

Section: Mining Shifts In the Literaturementioning

confidence: 99%

TextEssence: A Tool for Interactive Analysis of Semantic Shifts Between Corpora

Newman-Griffis¹,

Sivaraman²,

Perer³

et al. 2021

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua

Self Cite

View full text Add to dashboard Cite

Embeddings of words and concepts capture syntactic and semantic regularities of language; however, they have seen limited use as tools to study characteristics of different corpora and how they relate to one another. We introduce TextEssence, an interactive system designed to enable comparative analysis of corpora using embeddings. TextEssence includes visual, neighbor-based, and similarity-based modes of embedding analysis in a lightweight, web-based interface. We further propose a new measure of embedding confidence based on nearest neighborhood overlap, to assist in identifying high-quality embeddings for corpus analysis. A case study on COVID-19 scientific literature illustrates the utility of the system. TextEssence can be found at https: //textessence.github.io.

show abstract

Writing habits and telltale neighbors: analyzing clinical concept usage patterns with sublanguage embeddings

Cited by 4 publications

References 22 publications

Medical Information Extraction in the Age of Deep Learning

Medical Information Extraction in the Age of Deep Learning

Automated Coding of Under-Studied Medical Concept Domains: Linking Physical Activity Reports to the International Classification of Functioning, Disability, and Health

TextEssence: A Tool for Interactive Analysis of Semantic Shifts Between Corpora

Contact Info

Product

Resources

About