James D. Buntrock scite author profile

This article describes our system entry for the 2006 I2B2 contest "Challenges in Natural Language Processing for Clinical Data" for the task of identifying the smoking status of patients. Our system makes the simplifying assumption that patient-level smoking status determination can be achieved by accurately classifying individual sentences from a patient's record. We created our system with reusable text analysis components built on the Unstructured Information Management Architecture and Weka. This reuse of code minimized the development effort related specifically to our smoking status classifier. We report precision, recall, F-score, and 95% exact confidence intervals for each metric. Recasting the classification task for the sentence level and reusing code from other text analysis projects allowed us to quickly build a classification system that performs with a system F-score of 92.64 based on held-out data tests and of 85.57 on the formal evaluation data. Our general medical natural language engine is easily adaptable to a real-world medical informatics application. Some of the limitations as applied to the use-case are negation detection and temporal resolution.

show abstract

Prospective recruitment of patients with congestive heart failure using an ad-hoc binary classifier

Pakhomov

Buntrock

Chute

2005

Journal of Biomedical Informatics

View full text Add to dashboard Cite

This paper addresses a very specific problem of identifying patients diagnosed with a specific condition for potential recruitment in a clinical trial or an epidemiological study. We present a simple machine learning method for identifying patients diagnosed with congestive heart failure and other related conditions by automatically classifying clinical notes dictated at Mayo Clinic. This method relies on an automatic classifier trained on comparable amounts of positive and negative samples of clinical notes previously categorized by human experts. The documents are represented as feature vectors, where features are a mix of demographic information as well as single words and concept mappings to MeSH and HICDA classification systems. We compare two simple and efficient classification algorithms (Naïve Bayes and Perceptron) and a baseline term spotting method with respect to their accuracy and recall on positive samples. Depending on the test set, we find that Naïve Bayes yields better recall on positive samples (95 vs. 86%) but worse accuracy than Perceptron (57 vs. 65%). Both algorithms perform better than the baseline with recall on positive samples of 71% and accuracy of 54%.

show abstract

Toward a Learning Health-care System – Knowledge Delivery at the Point of Care Empowered by Big Data and NLP

Kaggal

Elayavilli

Mehrabi

et al. 2016

Biomed Inform Insights

View full text Add to dashboard Cite

The concept of optimizing health care by understanding and generating knowledge from previous evidence, ie, the Learning Health-care System (LHS), has gained momentum and now has national prominence. Meanwhile, the rapid adoption of electronic health records (EHRs) enables the data collection required to form the basis for facilitating LHS. A prerequisite for using EHR data within the LHS is an infrastructure that enables access to EHR data longitudinally for health-care analytics and real time for knowledge delivery. Additionally, significant clinical information is embedded in the free text, making natural language processing (NLP) an essential component in implementing an LHS. Herein, we share our institutional implementation of a big data-empowered clinical NLP infrastructure, which not only enables health-care analytics but also has real-time NLP processing capability. The infrastructure has been utilized for multiple institutional projects including the MayoExpertAdvisor, an individualized care recommendation solution for clinical care. We compared the advantages of big data over two other environments. Big data infrastructure significantly outperformed other infrastructure in terms of computing speed, demonstrating its value in making the LHS a possibility in the near future.

show abstract

LexGrid: A Framework for Representing, Storing, and Querying Biomedical Terminologies from Simple to Sublime

Pathak

Solbrig

Buntrock

et al. 2009

Journal of the American Medical Informatics Association

View full text Add to dashboard Cite

Many biomedical terminologies, classifications, and ontological resources such as the NCI Thesaurus (NCIT), International Classification of Diseases (ICD), Systematized Nomenclature of Medicine (SNOMED), Current Procedural Terminology (CPT), and Gene Ontology (GO) have been developed and used to build a variety of IT applications in biology, biomedicine, and health care settings. However, virtually all these resources involve incompatible formats, are based on different modeling languages, and lack appropriate tooling and programming interfaces (APIs) that hinder their wide-scale adoption and usage in a variety of application contexts. The Lexical Grid (LexGrid) project introduced in this paper is an ongoing community-driven initiative, coordinated by the Mayo Clinic Division of Biomedical Statistics and Informatics, designed to bridge this gap using a common terminology model called the LexGrid model. The key aspect of the model is to accommodate multiple vocabulary and ontology distribution formats and support of multiple data stores for federated vocabulary distribution. The model provides a foundation for building consistent and standardized APIs to access multiple vocabularies that support lexical search queries, hierarchy navigation, and a rich set of features such as recursive subsumption (e.g., get all the children of the concept penicillin). Existing LexGrid implementations include the LexBIG API as well as a reference implementation of the HL7 Common Terminology Services (CTS) specification providing programmatic access via Java, Web, and Grid services.

show abstract

Big Data and the Electronic Health Record

Peters¹,

Buntrock²

2014

View full text Add to dashboard Cite

The electronic medical record has evolved from a digital representation of individual patient results and documents to information of large scale and complexity. Big Data refers to new technologies providing management and processing capabilities, targeting massive and disparate data sets. For an individual patient, techniques such as Natural Language Processing allow the integration and analysis of textual reports with structured results. For groups of patients, Big Data offers the promise of large-scale analysis of outcomes, patterns, temporal trends, and correlations. The evolution of Big Data analytics moves us from description and reporting to forecasting, predictive modeling, and decision optimization.

show abstract

Identification of patients with congestive heart failure using a binary classifier

Pakhomov

Buntrock

Chute

2003

View full text Add to dashboard Cite

This paper addresses a very specific problem that happens to be common in health science research. We present a machine learning based method for identifying patients diagnosed with congestive heart failure and other related conditions by automatically classifying clinical notes. This method relies on a Perceptron neural network classifier trained on comparable amounts of positive and negative samples of clinical notes previously categorized by human experts. The documents are represented as feature vectors where features are a mix of single words and concept mappings to MeSH and HICDA ontologies. The method is designed and implemented to support a particular epidemiological study but has broader implications for clinical research. In this paper, we describe the method and present experimental classification results based on classification accuracy and positive predictive value.

show abstract

High throughput modularized NLP system for clinical text

Pakhomov

Buntrock

Duffy

2005

View full text Add to dashboard Cite

This paper presents the results of the development of a high throughput, real time modularized text analysis and information retrieval system that identifies clinically relevant entities in clinical notes, maps the entities to several standardized nomenclatures and makes them available for subsequent information retrieval and data mining. The performance of the system was validated on a small collection of 351 documents partitioned into 4 query topics and manually examined by 3 physicians and 3 nurse abstractors for relevance to the query topics. We find that simple key phrase searching results in 73% recall and 77% precision. A combination of NLP approaches to indexing improve the recall to 92%, while lowering the precision to 67%.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

James D. Buntrock

Automating the Assignment of Diagnosis Codes to Patient Encounters Using Example-based and Machine Learning Techniques

Mayo Clinic NLP System for Patient Smoking Status Identification

Prospective recruitment of patients with congestive heart failure using an ad-hoc binary classifier

Toward a Learning Health-care System – Knowledge Delivery at the Point of Care Empowered by Big Data and NLP

LexGrid: A Framework for Representing, Storing, and Querying Biomedical Terminologies from Simple to Sublime

Big Data and the Electronic Health Record

Identification of patients with congestive heart failure using a binary classifier

High throughput modularized NLP system for clinical text

Contact Info

Product

Resources

About