Son Doan scite author profile

Medication information is one of the most important types of clinical data in electronic medical records. It is critical for healthcare safety and quality, as well as for clinical research that uses electronic medical record data. However, medication data are often recorded in clinical notes as free-text. As such, they are not accessible to other computerized applications that rely on coded data. We describe a new natural language processing system (MedEx), which extracts medication information from clinical notes. MedEx was initially developed using discharge summaries. An evaluation using a data set of 50 discharge summaries showed it performed well on identifying not only drug names (F-measure 93.2%), but also signature information, such as strength, route, and frequency, with F-measures of 94.5%, 93.9%, and 96.0% respectively. We then applied MedEx unchanged to outpatient clinic visit notes. It performed similarly with F-measures over 90% on a set of 25 clinic visit notes.

show abstract

BioCaster: detecting public health rumors with a Web-based text mining system

Collier

et al. 2008

View full text Add to dashboard Cite

Summary: BioCaster is an ontology-based text mining system for detecting and tracking the distribution of infectious disease outbreaks from linguistic signals on the Web. The system continuously analyzes documents reported from over 1700 RSS feeds, classifies them for topical relevance and plots them onto a Google map using geocoded information. The background knowledge for bridging the gap between Layman's terms and formal-coding systems is contained in the freely available BioCaster ontology which includes information in eight languages focused on the epidemiological role of pathogens as well as geographical locations with their latitudes/longitudes. The system consists of four main stages: topic classification, named entity recognition (NER), disease/location detection and event recognition. Higher order event analysis is used to detect more precisely specified warning signals that can then be notified to registered users via email alerts. Evaluation of the system for topic recognition and entity identification is conducted on a gold standard corpus of annotated news articles.Availability: The BioCaster map and ontology are freely available via a web portal at http://www.biocaster.org.Contact: collier@nii.ac.jp

show abstract

An Analysis of Twitter Messages in the 2011 Tohoku Earthquake

Doan

Collier

2012

100

View full text Add to dashboard Cite

Social media such as Facebook and Twitter have proven to be a useful resource to understand public opinion towards real world events. In this paper, we investigate over 1.5 million Twitter messages (tweets) for the period 9 th March 2011 to 31 st May 2011 in order to track awareness and anxiety levels in the Tokyo metropolitan district to the 2011 Tohoku Earthquake and subsequent tsunami and nuclear emergencies. These three events were tracked using both English and Japanese tweets. Preliminary results indicated: 1) close correspondence between Twitter data and earthquake events, 2) strong correlation between English and Japanese tweets on the same events, 3) tweets in the native language play an important roles in early warning, 4) tweets showed how quickly Japanese people's anxiety returned to normal levels after the earthquake event. Several distinctions between English and Japanese tweets on earthquake events are also discussed. The results suggest that Twitter data can be used as a useful resource for tracking the public mood of populations affected by natural disasters as well as an early warning system.

show abstract

Crowdsourcing Twitter annotations to identify first-hand experiences of prescription drug use

Alvaro

Conway

Doan

et al. 2015

Journal of Biomedical Informatics

View full text Add to dashboard Cite

Self-reported patient data has been shown to be a valuable knowledge source for post-market pharmacovigilance. In this paper we propose using the popular micro-blogging service Twitter to gather evidence about adverse drug reactions (ADRs) after firstly having identified micro-blog messages (also know as "tweets") that report first-hand experience. In order to achieve this goal we explore machine learning with data crowdsourced from laymen annotators. With the help of lay annotators recruited from CrowdFlower we manually annotated 1548 tweets containing keywords related to two kinds of drugs: SSRIs (eg. Paroxetine), and cognitive enhancers (eg. Ritalin). Our results show that inter-annotator agreement (Fleiss' kappa) for crowdsourcing ranks in moderate agreement with a pair of experienced annotators (Spearman's Rho=0.471). We utilized the gold standard annotations from CrowdFlower for automatically training a range of supervised machine learning models to recognize first-hand experience. F-Score values are reported for 6 of these techniques with the Bayesian Generalized Linear Model being the best (F-Score=0.64 and Informedness=0.43) when combined with a selected set of features obtained by using information gain criteria.

show abstract

Natural Language Processing in Biomedicine: A Unified System Architecture Overview

Doan

Conway

Phương³

et al. 2014

View full text Add to dashboard Cite

In modern electronic medical records (EMR) much of the clinically important data -signs and symptoms, symptom severity, disease status, etc. -are not provided in structured data fields, but rather are encoded in clinician generated narrative text. Natural language processing (NLP) provides a means of "unlocking" this important data source for applications in clinical decision support, quality assurance, and public health. This chapter provides an overview of representative NLP systems in biomedicine based on a unified architectural view. A general architecture in an NLP system consists of two main components: background knowledge that includes biomedical knowledge resources and a framework that integrates NLP tools to process text. Systems differ in both components, which we will review briefly. Additionally, challenges facing current research efforts in biomedical NLP include the paucity of large, publicly available annotated corpora, although initiatives that facilitate data sharing, system evaluation, and collaborative work between researchers in clinical NLP are starting to emerge. IntroductionIn modern electronic medical records (EMR) most of the clinically important data -signs and symptoms, symptom severity, disease status, etc. -is not provided in structured data fields, but are rather encoded in clinician-generated narrative text. Natural language processing (NLP) provides a means of "unlocking" this important data source, converting unstructured text to structured, actionable data for use in applications for clinical decision support, quality assurance, and public health surveillance. There are currently many NLP systems that have been 2 successfully applied to biomedical text. It is not our goal to review all of them in this chapter, but rather to provide an overview of how the field evolved from producing monolithic software built on platforms that were available at the time they were developed to contemporary component-based systems built on top of general frameworks. More importantly, the performance of these systems is tightly associated with their "ingredients" (i.e., modules that are used to form its background knowledge), and how these modules are combined on top of the general framework. We highlight certain systems based on their landmark status as well as on the diversity of components and frameworks they are based on. [7]. The review in this chapter differs from previous work in that it emphasizes the historical development of landmark clinical NLP systems, and presents each system in light of a unified system architecture.We consider that each NLP system in biomedicine contains two main components: biomedical background knowledge and a framework that integrates NLP tools. In the rest of this paper, we will first outline our model architecture for NLP systems in biomedicine, before going on to review and summarize representative NLP systems, starting with an early NLP system, LSP-MLP, and closing our discussion with the presentation of a more recent system, cTAKES. Finally, we will discuss...

show abstract

Extracting health-related causality from twitter messages using natural language processing

Doan

Yang

Tilak

et al. 2019

BMC Med Inform Decis Mak

View full text Add to dashboard Cite

Background Twitter messages (tweets) contain various types of topics in our daily life, which include health-related topics. Analysis of health-related tweets would help us understand health conditions and concerns encountered in our daily lives. In this paper we evaluate an approach to extracting causalities from tweets using natural language processing (NLP) techniques. Methods Lexico-syntactic patterns based on dependency parser outputs are used for causality extraction. We focused on three health-related topics: “stress”, “insomnia”, and “headache.” A large dataset consisting of 24 million tweets are used. Results The results show the proposed approach achieved an average precision between 74.59 to 92.27% in comparisons with human annotations. Conclusions Manual analysis on extracted causalities in tweets reveals interesting findings about expressions on health-related topic posted by Twitter users.

show abstract

Classifying disease outbreak reports using n-grams and semantic features

Conway

Doan

Kawazoe

et al. 2009

International Journal of Medical Informatics

View full text Add to dashboard Cite

This paper explores the benefits of using ngrams and semantic features for the classification of disease outbreak reports, in the context of a text mining systemBioCaster -that identifies and tracks emerging infectious disease outbreaks from online news. We show that a combination of bag-of-words features, n-grams and semantic features, in conjunction with feature selection, improves classification accuracy at a statistically significant level when compared to previous work. A novel feature of the work reported in this paper is the use of a semantic tagger -the USAS tagger -to generate features.

show abstract

Integrating existing natural language processing tools for medication extraction from discharge summaries

Doan

Bastarache

Klimkowski

et al. 2010

J Am Med Inform Assoc

View full text Add to dashboard Cite

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Son Doan

MedEx: a medication information extraction system for clinical narratives

BioCaster: detecting public health rumors with a Web-based text mining system

An Analysis of Twitter Messages in the 2011 Tohoku Earthquake

Crowdsourcing Twitter annotations to identify first-hand experiences of prescription drug use

Natural Language Processing in Biomedicine: A Unified System Architecture Overview

Extracting health-related causality from twitter messages using natural language processing

Classifying disease outbreak reports using n-grams and semantic features

Integrating existing natural language processing tools for medication extraction from discharge summaries

Contact Info

Product

Resources

About