Cheryl Clark scite author profile

A review of published work in clinical natural language processing (NLP) may suggest that the negation detection task has been “solved.” This work proposes that an optimizable solution does not equal a generalizable solution. We introduce a new machine learning-based Polarity Module for detecting negation in clinical text, and extensively compare its performance across domains. Using four manually annotated corpora of clinical text, we show that negation detection performance suffers when there is no in-domain development (for manual methods) or training data (for machine learning-based methods). Various factors (e.g., annotation guidelines, named entity characteristics, the amount of data, and lexical and syntactic context) play a role in making generalizability difficult, but none completely explains the phenomenon. Furthermore, generalizability remains challenging because it is unclear whether to use a single source for accurate data, combine all sources into a single model, or apply domain adaptation methods. The most reliable means to improve negation detection is to manually annotate in-domain training data (or, perhaps, manually modify rules); this is a strategy for optimizing performance, rather than generalizing it. These results suggest a direction for future work in domain-adaptive and task-adaptive methods for clinical NLP.

show abstract

MedXN: an open source medication extraction and normalization tool for clinical text

Sohn

Clark

Halgrim

et al. 2014

View full text Add to dashboard Cite

show abstract

Hiding in plain sight: use of realistic surrogates to reduce exposure of protected health information in clinical text

Carrell

Malin

Aberdeen

et al. 2013

J Am Med Inform Assoc

View full text Add to dashboard Cite

show abstract

Identifying Smokers with a Medical Extraction System

Clark¹,

Good²,

Jezierny³

et al. 2008

Journal of the American Medical Informatics Association

View full text Add to dashboard Cite

The Clinical Language Understanding group at Nuance Communications has developed a medical information extraction system that combines a rule-based extraction engine with machine learning algorithms to identify and categorize references to patient smoking in clinical reports. The extraction engine identifies smoking references; documents that contain no smoking references are classified as UNKNOWN. For the remaining documents, the extraction engine uses linguistic analysis to associate features such as status and time to smoking mentions. Machine learning is used to classify the documents based on these features. This approach shows overall accuracy in the 90s on all data sets used. Classification using engine-generated and word-based features outperforms classification using only word-based features for all data sets, although the difference gets smaller as the data set size increases. These techniques could be applied to identify other risk factors, such as drug and alcohol use, or a family history of a disease.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Cheryl Clark

The MITRE Identification Scrubber Toolkit: Design, training, and assessment

Negation’s Not Solved: Generalizability Versus Optimizability in Clinical Natural Language Processing

MedXN: an open source medication extraction and normalization tool for clinical text

Hiding in plain sight: use of realistic surrogates to reduce exposure of protected health information in clinical text

Identifying Smokers with a Medical Extraction System

Contact Info

Product

Resources

About