Dmitriy Dligach scite author profile

ObjectiveTo create annotated clinical narratives with layers of syntactic and semantic labels to facilitate advances in clinical natural language processing (NLP). To develop NLP algorithms and open source components.MethodsManual annotation of a clinical narrative corpus of 127 606 tokens following the Treebank schema for syntactic information, PropBank schema for predicate-argument structures, and the Unified Medical Language System (UMLS) schema for semantic information. NLP components were developed.ResultsThe final corpus consists of 13 091 sentences containing 1772 distinct predicate lemmas. Of the 766 newly created PropBank frames, 74 are verbs. There are 28 539 named entity (NE) annotations spread over 15 UMLS semantic groups, one UMLS semantic type, and the Person semantic category. The most frequent annotations belong to the UMLS semantic groups of Procedures (15.71%), Disorders (14.74%), Concepts and Ideas (15.10%), Anatomy (12.80%), Chemicals and Drugs (7.49%), and the UMLS semantic type of Sign or Symptom (12.46%). Inter-annotator agreement results: Treebank (0.926), PropBank (0.891–0.931), NE (0.697–0.750). The part-of-speech tagger, constituency parser, dependency parser, and semantic role labeler are built from the corpus and released open source. A significant limitation uncovered by this project is the need for the NLP community to develop a widely agreed-upon schema for the annotation of clinical concepts and their relations.ConclusionsThis project takes a foundational step towards bringing the field of clinical NLP up to par with NLP in the general domain. The corpus creation and NLP components provide a resource for research and application development that would have been previously impossible.

show abstract

Large-scale identification of patients with cerebral aneurysms using natural language processing

Castro

Dligach

Finan

et al. 2017

Neurology

View full text Add to dashboard Cite

Objective: To use natural language processing (NLP) in conjunction with the electronic medical record (EMR) to accurately identify patients with cerebral aneurysms and their matched controls.Methods: ICD-9 and Current Procedural Terminology codes were used to obtain an initial data mart of potential aneurysm patients from the EMR. NLP was then used to train a classification algorithm with .632 bootstrap cross-validation used for correction of overfitting bias. The classification rule was then applied to the full data mart. Additional validation was performed on 300 patients classified as having aneurysms. Controls were obtained by matching age, sex, race, and healthcare use.Results: We identified 55,675 patients of 4.2 million patients with ICD-9 and Current Procedural Terminology codes consistent with cerebral aneurysms. Of those, 16,823 patients had the term aneurysm occur near relevant anatomic terms. After training, a final algorithm consisting of 8 coded and 14 NLP variables was selected, yielding an overall area under the receiveroperating characteristic curve of 0.95. After the final algorithm was applied, 5,589 patients were classified as having aneurysms, and 54,952 controls were matched to those patients. The positive predictive value based on a validation cohort of 300 patients was 0.86. Conclusions:We harnessed the power of the EMR by applying NLP to obtain a large cohort of patients with intracranial aneurysms and their matched controls. Such algorithms can be generalized to other diseases for epidemiologic and genetic studies. Cerebral aneurysm is a potentially devastating disorder that affects nearly 3% of the population.

show abstract

Association of intracranial aneurysm rupture with smoking duration, intensity, and cessation

et al. 2017

View full text Add to dashboard Cite

show abstract

Multilayered temporal modeling for the clinical domain

Lin

Dligach

Miller

et al. 2015

View full text Add to dashboard Cite

show abstract

Normalization and standardization of electronic health records for high-throughput phenotyping: the SHARPn consortium

Pathak

Bailey

Beebe

et al. 2013

J Am Med Inform Assoc

View full text Add to dashboard Cite

End-to-end automated systems for extracting clinical information from diverse EHR systems require extensive use of standardized vocabularies and terminologies, as well as robust information models for storing, discovering, and processing that information. This study demonstrates the application of modular and open-source resources for enabling secondary use of EHR data through normalization into standards-based, comparable, and consistent format for high-throughput phenotyping to identify patient cohorts.

show abstract

Automatic identification of methotrexate-induced liver toxicity in patients with rheumatoid arthritis from the electronic medical record

Lin

Karlson

Dligach

et al. 2014

View full text Add to dashboard Cite

show abstract

Natural language processing and machine learning to identify alcohol misuse from the electronic health record in trauma patients: development and internal validation

Afshar

Phillips

Karnik

et al. 2019

View full text Add to dashboard Cite

Objective Alcohol misuse is present in over a quarter of trauma patients. Information in the clinical notes of the electronic health record of trauma patients may be used for phenotyping tasks with natural language processing (NLP) and supervised machine learning. The objective of this study is to train and validate an NLP classifier for identifying patients with alcohol misuse. Materials and Methods An observational cohort of 1422 adult patients admitted to a trauma center between April 2013 and November 2016. Linguistic processing of clinical notes was performed using the clinical Text Analysis and Knowledge Extraction System. The primary analysis was the binary classification of alcohol misuse. The Alcohol Use Disorders Identification Test served as the reference standard. Results The data corpus comprised 91 045 electronic health record notes and 16 091 features. In the final machine learning classifier, 16 features were selected from the first 24 hours of notes for identifying alcohol misuse. The classifier’s performance in the validation cohort had an area under the receiver-operating characteristic curve of 0.78 (95% confidence interval [CI], 0.72 to 0.85). Sensitivity and specificity were at 56.0% (95% CI, 44.1% to 68.0%) and 88.9% (95% CI, 84.4% to 92.8%). The Hosmer-Lemeshow goodness-of-fit test demonstrates the classifier fits the data well (P = .17). A simpler rule-based keyword approach had a decrease in sensitivity when compared with the NLP classifier from 56.0% to 18.2%. Conclusions The NLP classifier has adequate predictive validity for identifying alcohol misuse in trauma centers. External validation is needed before its application to augment screening.

show abstract

Lipid-Lowering Agents and High HDL (High-Density Lipoprotein) Are Inversely Associated With Intracranial Aneurysm Rupture

Can

Castro

Dligach

et al. 2018

Stroke

View full text Add to dashboard Cite

Background and Purpose-Growing evidence from experimental animal models and clinical studies suggests the protective effect of statin use against rupture of intracranial aneurysms; however, results from large studies detailing the relationship between intracranial aneurysm rupture and total cholesterol, HDL (high-density lipoprotein), LDL (lowdensity lipoprotein), and lipid-lowering agent use are lacking. Methods-The medical records of 4701 patients with 6411 intracranial aneurysms diagnosed at the Massachusetts General Hospital and the Brigham and Women's Hospital between 1990 and 2016 were reviewed and analyzed. Patients were separated into ruptured and nonruptured groups. Univariable and multivariable logistic regression analyses were performed to determine the effects of lipids (total cholesterol, LDL, and HDL) and lipid-lowering medications on intracranial aneurysm rupture risk. Propensity score weighting was used to account for differences in baseline characteristics of the cohorts. Results-Lipid-lowering agent use was significantly inversely associated with rupture status (odds ratio, 0.58; 95% confidence interval, 0.47-0.71). In a subgroup analysis of complete cases that includes both lipid-lowering agent use and lipid values, higher HDL levels (odds ratio, 0.95; 95% confidence interval, 0.93-0.98) and lipid-lowering agent use (odds ratio, 0.41; 95% confidence interval, 0.23-0.73) were both significantly and inversely associated with rupture status, whereas total cholesterol and LDL levels were not significant. A monotonic exposure-response curve between HDL levels and risk of aneurysmal rupture was obtained. A lthough the correlation between serum total cholesterol and risk of coronary heart disease has been well established, the relation with aneurysmal subarachnoid hemorrhage (aSAH) remains controversial with studies reporting both increased and decreased associations. A recent systematic review of 21 studies investigating the association between cholesterol and risk of SAH showed that elevated total cholesterol level increases the risk for SAH in men. Conclusions-Higher1 However, study sizes of included studies were small, ranging from 55 to 858 patients with ruptured intracranial aneurysms.1 Moreover, only 4 studies included HDL (high-density lipoprotein) values, whereas none of the studies assessed LDL (low-density lipoprotein) values.1 In addition, growing evidence from various experimental animal models and smaller clinical studies supports the inverse relationship between statin use and intracranial aneurysm rupture.2-4 Here, we present the largest casecontrol study to date, to investigate the role of total cholesterol, HDL, LDL, and use of lipid-lowering agents on the risk of SAH in 4701 patients with 6411 intracranial aneurysms. MethodsThe data that support the findings of this study are available from the corresponding author on reasonable request. We included 4701 patients who were diagnosed with an intracranial aneurysm

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Dmitriy Dligach

Towards comprehensive syntactic and semantic annotations of the clinical narrative

Large-scale identification of patients with cerebral aneurysms using natural language processing

Association of intracranial aneurysm rupture with smoking duration, intensity, and cessation

Multilayered temporal modeling for the clinical domain

Normalization and standardization of electronic health records for high-throughput phenotyping: the SHARPn consortium

Automatic identification of methotrexate-induced liver toxicity in patients with rheumatoid arthritis from the electronic medical record

Natural language processing and machine learning to identify alcohol misuse from the electronic health record in trauma patients: development and internal validation

Lipid-Lowering Agents and High HDL (High-Density Lipoprotein) Are Inversely Associated With Intracranial Aneurysm Rupture

Contact Info

Product

Resources

About