CLAMP – a toolkit for efficiently building customized clinical natural language processing pipelines

Soysal, Ergin; Wang, Jingqi; Jiang, Min; Wu, Yonghui; Pakhomov, Serguei; Liu, Hongfang; Xu, Hua

doi:10.1093/jamia/ocx132

Cited by 273 publications

(192 citation statements)

References 26 publications

Supporting

Mentioning

191

Contrasting

Unclassified

Order By: Relevance

“…NLP-extracted medication dose processing module. Pro-Med-NLP can process output from medExtractR 16 or three other NLP systems for medication information extraction: MedEx, 13 CLAMP, 14 and MedXN 15 (Figure 3). As it was challenging to process the raw extracted data, especially for drugs prescribed multiple times a day, we developed a rigorous postprocessing algorithm that was implemented in Pro-Med-NLP.…”

Section: Postextraction Data Processing Proceduresmentioning

confidence: 99%

Development of a System for Postmarketing Population Pharmacokinetic and Pharmacodynamic Studies Using Real‐World Data From Electronic Health Records

Choi

Beck

McNeer

et al. 2020

Clin Pharma and Therapeutics

View full text Add to dashboard Cite

Postmarketing population pharmacokinetic (PK) and pharmacodynamic (PD) studies can be useful to capture patient characteristics affecting PK or PD in real‐world settings. These studies require longitudinally measured dose, outcomes, and covariates in large numbers of patients; however, prospective data collection is cost‐prohibitive. Electronic health records (EHRs) can be an excellent source for such data, but there are challenges, including accurate ascertainment of drug dose. We developed a standardized system to prepare datasets from EHRs for population PK/PD studies. Our system handles a variety of tasks involving data extraction from clinical text using a natural language processing algorithm, data processing, and data building. Applying this system, we performed a fentanyl population PK analysis, resulting in comparable parameter estimates to a prior study. This new system makes the EHR data extraction and preparation process more efficient and accurate and provides a powerful tool to facilitate postmarketing population PK/PD studies using information available in EHRs.

show abstract

Section: Postextraction Data Processing Proceduresmentioning

confidence: 99%

Development of a System for Postmarketing Population Pharmacokinetic and Pharmacodynamic Studies Using Real‐World Data From Electronic Health Records

Choi

Beck

McNeer

et al. 2020

Clin Pharma and Therapeutics

View full text Add to dashboard Cite

show abstract

“…We used machine learning features that reported to be useful for clinical NER in previous studies, including word n-grams, prefixes, suffixes, word shape (combination patterns of uppercase and lowercase letters, numbers), sentence-level features (sentence length, whether the sentence is a part of list), brown clustering, and discrete word embedding. [40, 41] The discrete word embedding features were derived by converting the real numbers in the word embedding into discrete categories in [POSITIVE, NEGATIVE, NEUTRAL]. For each dimension of word embedding, we calculated the positive mean value – the arithmetic mean among all positive values of this dimension, and the negative mean – the arithmetic mean among all negative values of this dimension.…”

Section: Methodsmentioning

confidence: 99%

MADEx: A System for Detecting Medications, Adverse Drug Events, and Their Relations from Clinical Notes

et al. 2019

Self Cite

View full text Add to dashboard Cite

Introduction Early detection of Adverse Drug Events (ADEs) from Electronic Health Records (EHRs) is an important, challenging task to support pharmacovigilance and drug safety surveillance. A well-known challenge to use clinical text for detection of ADEs is that much of the detailed information is documented in a narrative manner. Clinical Natural Language Processing (NLP) is the key technology to extract information from unstructured clinical text. Objective We present a machine learning-based clinical NLP system - MADEx for detecting medications, ADEs and their relations from clinical notes. Methods We developed a Recurrent Neural Network (RNN) model using Long Short-Term Memory (LSTM) strategy for clinical Name Entity Recognition (NER) and compared it with a baseline Conditional Random Fields (CRFs). We developed a modified training strategy for RNN, which outperformed the widely used early stop strategy. For relation extraction, we compared Support Vector Machines (SVMs) and Random Forests on single-sentence relations and cross-sentence relations. We also developed an integrated pipeline to extract entities and relations together by combining RNN and SVMs. Results MADEx achieved top three best performance (F1-score of 0.8233) for clinical NER in the 2018 Medication and Adverse Drug Events (MADE1.0) challenge. The post-challenge evaluation showed that the relation extraction module and integrated pipeline (identify entity and relation together) of MADEx are comparable to the best systems developed in this challenge. Conclusion This study demonstrated the efficiency of deep learning methods for automatic extraction of medications, ADEs, and their relations from clinical text to support pharmacovigilance and drug safety surveillance.

show abstract

“…A rule-based NER was carried out in the Clinical Language Annotation, Modeling, and Processing Toolkit (CLAMP) [38]—a Natural Language Processing (NLP) software—and was handled by a pipeline that included a sentence detector, a tokenizer, and a dictionary lookup component. The input to this pipeline included the preprocessed publications as well as a method dictionary with semantic labels generated from the Method Ontology.…”

Section: Methodsmentioning

confidence: 99%

Developing a healthcare dataset information resource (DIR) based on Semantic Web

et al. 2018

View full text Add to dashboard Cite

BackgroundThe right dataset is essential to obtain the right insights in data science; therefore, it is important for data scientists to have a good understanding of the availability of relevant datasets as well as the content, structure, and existing analyses of these datasets. While a number of efforts are underway to integrate the large amount and variety of datasets, the lack of an information resource that focuses on specific needs of target users of datasets has existed as a problem for years. To address this gap, we have developed a Dataset Information Resource (DIR), using a user-oriented approach, which gathers relevant dataset knowledge for specific user types. In the present version, we specifically address the challenges of entry-level data scientists in learning to identify, understand, and analyze major datasets in healthcare. We emphasize that the DIR does not contain actual data from the datasets but aims to provide comprehensive knowledge about the datasets and their analyses.MethodsThe DIR leverages Semantic Web technologies and the W3C Dataset Description Profile as the standard for knowledge integration and representation. To extract tailored knowledge for target users, we have developed methods for manual extractions from dataset documentations as well as semi-automatic extractions from related publications, using natural language processing (NLP)-based approaches. A semantic query component is available for knowledge retrieval, and a parameterized question-answering functionality is provided to facilitate the ease of search.ResultsThe DIR prototype is composed of four major components—dataset metadata and related knowledge, search modules, question answering for frequently-asked questions, and blogs. The current implementation includes information on 12 commonly used large and complex healthcare datasets. The initial usage evaluation based on health informatics novices indicates that the DIR is helpful and beginner-friendly.ConclusionsWe have developed a novel user-oriented DIR that provides dataset knowledge specialized for target user groups. Knowledge about datasets is effectively represented in the Semantic Web. At this initial stage, the DIR has already been able to provide sophisticated and relevant knowledge of 12 datasets to help entry health informacians learn healthcare data analysis using suitable datasets. Further development of both content and function levels is underway.

show abstract

CLAMP – a toolkit for efficiently building customized clinical natural language processing pipelines

Cited by 273 publications

References 26 publications

Development of a System for Postmarketing Population Pharmacokinetic and Pharmacodynamic Studies Using Real‐World Data From Electronic Health Records

Development of a System for Postmarketing Population Pharmacokinetic and Pharmacodynamic Studies Using Real‐World Data From Electronic Health Records

MADEx: A System for Detecting Medications, Adverse Drug Events, and Their Relations from Clinical Notes

Developing a healthcare dataset information resource (DIR) based on Semantic Web

Contact Info

Product

Resources

About