Sean Finan scite author profile

This article discusses the requirements of a formal specification for the annotation of temporal information in clinical narratives. We discuss the implementation and extension of ISO-TimeML for annotating a corpus of clinical notes, known as the THYME corpus. To reflect the information task and the heavily inference-based reasoning demands in the domain, a new annotation guideline has been developed, “the THYME Guidelines to ISO-TimeML (THYME-TimeML)”. To clarify what relations merit annotation, we distinguish between linguistically-derived and inferentially-derived temporal orderings in the text. We also apply a top performing TempEval 2013 system against this new resource to measure the difficulty of adapting systems to the clinical domain. The corpus is available to the community and has been proposed for use in a SemEval 2015 task.

show abstract

Association of intracranial aneurysm rupture with smoking duration, intensity, and cessation

Can

et al. 2017

View full text Add to dashboard Cite

show abstract

Large-scale identification of patients with cerebral aneurysms using natural language processing

et al. 2017

View full text Add to dashboard Cite

Objective: To use natural language processing (NLP) in conjunction with the electronic medical record (EMR) to accurately identify patients with cerebral aneurysms and their matched controls.Methods: ICD-9 and Current Procedural Terminology codes were used to obtain an initial data mart of potential aneurysm patients from the EMR. NLP was then used to train a classification algorithm with .632 bootstrap cross-validation used for correction of overfitting bias. The classification rule was then applied to the full data mart. Additional validation was performed on 300 patients classified as having aneurysms. Controls were obtained by matching age, sex, race, and healthcare use.Results: We identified 55,675 patients of 4.2 million patients with ICD-9 and Current Procedural Terminology codes consistent with cerebral aneurysms. Of those, 16,823 patients had the term aneurysm occur near relevant anatomic terms. After training, a final algorithm consisting of 8 coded and 14 NLP variables was selected, yielding an overall area under the receiveroperating characteristic curve of 0.95. After the final algorithm was applied, 5,589 patients were classified as having aneurysms, and 54,952 controls were matched to those patients. The positive predictive value based on a validation cohort of 300 patients was 0.86. Conclusions:We harnessed the power of the EMR by applying NLP to obtain a large cohort of patients with intracranial aneurysms and their matched controls. Such algorithms can be generalized to other diseases for epidemiologic and genetic studies. Cerebral aneurysm is a potentially devastating disorder that affects nearly 3% of the population.

show abstract

DeepPhe: A Natural Language Processing System for Extracting Cancer Phenotypes from Clinical Records

Savova

Tseytlin

Finan

et al. 2017

View full text Add to dashboard Cite

Precise phenotype information is needed to understand the effects of genetic and epigenetic changes on tumor behavior and responsiveness. Extraction and representation of cancer phenotypes is currently mostly performed manually making it difficult to correlate phenotypic data to genomic data. In addition, genomic data is being produced at an increasingly faster pace, exacerbating the problem. The DeepPhe software enables automated extraction of detailed phenotype information from Electronic Medical Records of cancer patients. The system implements advanced Natural Language Processing and knowledge engineering methods within a flexible modular architecture, and was evaluated using a manually-annotated dataset of the University of Pittsburgh Medical Center (UPMC) breast cancer patients. The resulting platform provides critical and missing computational methods for computational phenotyping. Working in tandem with advanced analysis of high-throughput sequencing, these approaches will further accelerate the transition to precision cancer treatment.

show abstract

Lipid-Lowering Agents and High HDL (High-Density Lipoprotein) Are Inversely Associated With Intracranial Aneurysm Rupture

Can

Castro

Dligach

et al. 2018

Stroke

View full text Add to dashboard Cite

Background and Purpose-Growing evidence from experimental animal models and clinical studies suggests the protective effect of statin use against rupture of intracranial aneurysms; however, results from large studies detailing the relationship between intracranial aneurysm rupture and total cholesterol, HDL (high-density lipoprotein), LDL (lowdensity lipoprotein), and lipid-lowering agent use are lacking. Methods-The medical records of 4701 patients with 6411 intracranial aneurysms diagnosed at the Massachusetts General Hospital and the Brigham and Women's Hospital between 1990 and 2016 were reviewed and analyzed. Patients were separated into ruptured and nonruptured groups. Univariable and multivariable logistic regression analyses were performed to determine the effects of lipids (total cholesterol, LDL, and HDL) and lipid-lowering medications on intracranial aneurysm rupture risk. Propensity score weighting was used to account for differences in baseline characteristics of the cohorts. Results-Lipid-lowering agent use was significantly inversely associated with rupture status (odds ratio, 0.58; 95% confidence interval, 0.47-0.71). In a subgroup analysis of complete cases that includes both lipid-lowering agent use and lipid values, higher HDL levels (odds ratio, 0.95; 95% confidence interval, 0.93-0.98) and lipid-lowering agent use (odds ratio, 0.41; 95% confidence interval, 0.23-0.73) were both significantly and inversely associated with rupture status, whereas total cholesterol and LDL levels were not significant. A monotonic exposure-response curve between HDL levels and risk of aneurysmal rupture was obtained. A lthough the correlation between serum total cholesterol and risk of coronary heart disease has been well established, the relation with aneurysmal subarachnoid hemorrhage (aSAH) remains controversial with studies reporting both increased and decreased associations. A recent systematic review of 21 studies investigating the association between cholesterol and risk of SAH showed that elevated total cholesterol level increases the risk for SAH in men. Conclusions-Higher1 However, study sizes of included studies were small, ranging from 55 to 858 patients with ruptured intracranial aneurysms.1 Moreover, only 4 studies included HDL (high-density lipoprotein) values, whereas none of the studies assessed LDL (low-density lipoprotein) values.1 In addition, growing evidence from various experimental animal models and smaller clinical studies supports the inverse relationship between statin use and intracranial aneurysm rupture.2-4 Here, we present the largest casecontrol study to date, to investigate the role of total cholesterol, HDL, LDL, and use of lipid-lowering agents on the risk of SAH in 4701 patients with 6411 intracranial aneurysms. MethodsThe data that support the findings of this study are available from the corresponding author on reasonable request. We included 4701 patients who were diagnosed with an intracranial aneurysm

show abstract

Identification of subjects with polycystic ovary syndrome using electronic health records

Castro

Shen

et al. 2015

Reprod Biol Endocrinol

View full text Add to dashboard Cite

BackgroundPolycystic ovary syndrome (PCOS) is a heterogeneous disorder because of the variable criteria used for diagnosis. Therefore, International Classification of Diseases 9 (ICD-9) codes may not accurately capture the diagnostic criteria necessary for large scale PCOS identification. We hypothesized that use of electronic medical records text and data would more specifically capture PCOS subjects.MethodsSubjects with PCOS were identified in the Partners Healthcare Research Patients Data Registry by searching for the term “polycystic ovary syndrome” using natural language processing (n = 24,930). A training subset of 199 identified charts was reviewed and categorized based on likelihood of a true Rotterdam PCOS diagnosis, i.e. two out of three of the following: irregular menstrual cycles, hyperandrogenism and/or polycystic ovary morphology. Data from the history, physical exam, laboratory and radiology results were codified and extracted from notes of definite PCOS subjects. Thirty-two terms were used to build an algorithm for identifying definite PCOS cases and applied to the rest of the dataset. The positive predictive value cutoff was set at 76.8 % to maximize the number of subjects available for study. A true positive predictive value for the algorithm was calculated after review of 100 charts from subjects identified as definite PCOS cases with at least two documented Rotterdam criteria. The positive predictive value was compared to that calculated using 200 charts identified using the ICD-9 code for PCOS (256.4; n = 13,670). In addition, a cohort of previously recruited PCOS subjects was submitted for algorithm validation.ResultsChart review demonstrated that 64 % were confirmed as definitely PCOS using the algorithm, with a 9 % false positive rate. 66 % of subjects identified by ICD-9 code for PCOS could be confirmed as definitely PCOS, with an 8.5 % false positive rate. There was no significant difference in the positive predictive values using the two methods (p = 0.2). However, the number of charts that had insufficient confirmatory data was lower using the algorithm (5 % vs 11 %; p < 0.04). Of 477 subjects with PCOS recruited and examined individually and present in the database as patients, 451 were found within the algorithm dataset.ConclusionsExtraction of text parameters along with codified data improves the confidence in PCOS patient cohorts identified using the electronic medical record. However, the positive predictive value was not significantly different when using ICD-9 codes or the specific algorithm. Further studies are needed to determine the positive predictive value of the two methods in additional electronic medical record datasets.Electronic supplementary materialThe online version of this article (doi:10.1186/s12958-015-0115-z) contains supplementary material, which is available to authorized users.

show abstract

Use of natural language processing in electronic medical records to identify pregnant women with suicidal behavior: towards a solution to the complex classification problem

et al. 2018

View full text Add to dashboard Cite

Screening pregnant women for suicidal behavior in electronic medical records: diagnostic codes vs. clinical notes processed by natural language processing

Zhong

Karlson

Gelaye

et al. 2018

BMC Med Inform Decis Mak

View full text Add to dashboard Cite

BackgroundWe examined the comparative performance of structured, diagnostic codes vs. natural language processing (NLP) of unstructured text for screening suicidal behavior among pregnant women in electronic medical records (EMRs).MethodsWomen aged 10–64 years with at least one diagnostic code related to pregnancy or delivery (N = 275,843) from Partners HealthCare were included as our “datamart.” Diagnostic codes related to suicidal behavior were applied to the datamart to screen women for suicidal behavior. Among women without any diagnostic codes related to suicidal behavior (n = 273,410), 5880 women were randomly sampled, of whom 1120 had at least one mention of terms related to suicidal behavior in clinical notes. NLP was then used to process clinical notes for the 1120 women. Chart reviews were performed for subsamples of women.ResultsUsing diagnostic codes, 196 pregnant women were screened positive for suicidal behavior, among whom 149 (76%) had confirmed suicidal behavior by chart review. Using NLP among those without diagnostic codes, 486 pregnant women were screened positive for suicidal behavior, among whom 146 (30%) had confirmed suicidal behavior by chart review.ConclusionsThe use of NLP substantially improves the sensitivity of screening suicidal behavior in EMRs. However, the prevalence of confirmed suicidal behavior was lower among women who did not have diagnostic codes for suicidal behavior but screened positive by NLP. NLP should be used together with diagnostic codes for future EMR-based phenotyping studies for suicidal behavior.Electronic supplementary materialThe online version of this article (10.1186/s12911-018-0617-7) contains supplementary material, which is available to authorized users.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Sean Finan

Temporal Annotation in the Clinical Domain

Association of intracranial aneurysm rupture with smoking duration, intensity, and cessation

Large-scale identification of patients with cerebral aneurysms using natural language processing

DeepPhe: A Natural Language Processing System for Extracting Cancer Phenotypes from Clinical Records

Lipid-Lowering Agents and High HDL (High-Density Lipoprotein) Are Inversely Associated With Intracranial Aneurysm Rupture

Identification of subjects with polycystic ovary syndrome using electronic health records

Use of natural language processing in electronic medical records to identify pregnant women with suicidal behavior: towards a solution to the complex classification problem

Screening pregnant women for suicidal behavior in electronic medical records: diagnostic codes vs. clinical notes processed by natural language processing

Contact Info

Product

Resources

About