William R. Hogan scite author profile

Data in computer-based patient records (CPRs) have many uses beyond their primary role in patient care, including research and health-system management. Although the accuracy of CPR data directly affects these applications, there has been only sporadic interest in, and no previous review of, data accuracy in CPRs. This paper reviews the published studies of data accuracy in CPRs. These studies report highly variable levels of accuracy. This variability stems from differences in study design, in types of data studied, and in the CPRs themselves. These differences confound interpretation of this literature. We conclude that our knowledge of data accuracy in CPRs is not commensurate with its importance and further studies are needed. We propose methodological guidelines for studying accuracy that address shortcomings of the current literature. As CPR data are used increasingly for research, methods used in research databases to continuously monitor and improve accuracy should be applied to CPRs.

show abstract

Algorithms for rapid outbreak detection: a research synthesis

Buckeridge

Burkom

Campbell

et al. 2005

Journal of Biomedical Informatics

177

136

View full text Add to dashboard Cite

The threat of bioterrorism has stimulated interest in enhancing public health surveillance to detect disease outbreaks more rapidly than is currently possible. To advance research on improving the timeliness of outbreak detection, the Defense Advanced Research Project Agency sponsored the Bio-event Advanced Leading Indicator Recognition Technology (BioALIRT) project beginning in 2001. The purpose of this paper is to provide a synthesis of research on outbreak detection algorithms conducted by academic and industrial partners in the BioALIRT project. We first suggest a practical classification for outbreak detection algorithms that considers the types of information encountered in surveillance analysis. We then present a synthesis of our research according to this classification. The research conducted for this project has examined how to use spatial and other covariate information from disparate sources to improve the timeliness of outbreak detection. Our results suggest that use of spatial and other covariate information can improve outbreak detection performance. We also identified, however, methodological challenges that limited our ability to determine the benefit of using outbreak detection algorithms that operate on large volumes of data. Future research must address challenges such as forecasting expected values in high-dimensional data and generating spatial and multivariate test data sets.

show abstract

Natural Language Processing methods and systems for biomedical ontology learning

Liu

Hogan

Crowley

2011

Journal of Biomedical Informatics

126

View full text Add to dashboard Cite

While the biomedical informatics community widely acknowledges the utility of domain ontologies, there remain many barriers to their effective use. One important requirement of domain ontologies is that they must achieve a high degree of coverage of the domain concepts and concept relationships. However, the development of these ontologies is typically a manual, time-consuming, and often error-prone process. Limited resources result in missing concepts and relationships as well as difficulty in updating the ontology as knowledge changes. Methodologies developed in the fields of natural language processing, information extraction, information retrieval and machine learning provide techniques for automating the enrichment of an ontology from free-text documents. In this article, we review existing methodologies and developed systems, and discuss how existing methods can benefit the development of biomedical ontologies.

show abstract

Extracting social determinants of health from electronic health records using natural language processing: a systematic review

Patra

Sharma

Vekaria

et al. 2021

View full text Add to dashboard Cite

Objective Social determinants of health (SDoH) are nonclinical dispositions that impact patient health risks and clinical outcomes. Leveraging SDoH in clinical decision-making can potentially improve diagnosis, treatment planning, and patient outcomes. Despite increased interest in capturing SDoH in electronic health records (EHRs), such information is typically locked in unstructured clinical notes. Natural language processing (NLP) is the key technology to extract SDoH information from clinical text and expand its utility in patient care and research. This article presents a systematic review of the state-of-the-art NLP approaches and tools that focus on identifying and extracting SDoH data from unstructured clinical text in EHRs. Materials and Methods A broad literature search was conducted in February 2021 using 3 scholarly databases (ACL Anthology, PubMed, and Scopus) following Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. A total of 6402 publications were initially identified, and after applying the study inclusion criteria, 82 publications were selected for the final review. Results Smoking status (n = 27), substance use (n = 21), homelessness (n = 20), and alcohol use (n = 15) are the most frequently studied SDoH categories. Homelessness (n = 7) and other less-studied SDoH (eg, education, financial problems, social isolation and support, family problems) are mostly identified using rule-based approaches. In contrast, machine learning approaches are popular for identifying smoking status (n = 13), substance use (n = 9), and alcohol use (n = 9). Conclusion NLP offers significant potential to extract SDoH data from narrative clinical notes, which in turn can aid in the development of screening tools, risk prediction models, and clinical decision support systems.

show abstract

The Accuracy of Medication Data in an Outpatient Electronic Medical Record

Wagner¹,

Hogan²

1996

Journal of the American Medical Informatics Association

102

View full text Add to dashboard Cite

Medication records in an outpatient EMR may have significant levels of data error. Based on an analysis of correctable causes of error, the authors conclude that the most effective extension to the EMR studied would be to expand its scope to include all clinicians who can potentially change medications. Even with EMR extensions, however, ineradicable error due to patients and data entry will remain. Several implications of ineradicable error for MDSSs are discussed. The provision of a free-text "comments" field increased the accuracy of medication lists for clinician users at the expense of accuracy for a MDSS.

show abstract

A large language model for electronic health records

et al. 2022

View full text Add to dashboard Cite

There is an increasing interest in developing artificial intelligence (AI) systems to process and interpret electronic health records (EHRs). Natural language processing (NLP) powered by pretrained language models is the key technology for medical AI systems utilizing clinical narratives. However, there are few clinical language models, the largest of which trained in the clinical domain is comparatively small at 110 million parameters (compared with billions of parameters in the general domain). It is not clear how large clinical language models with billions of parameters can help medical AI systems utilize unstructured EHRs. In this study, we develop from scratch a large clinical language model—GatorTron—using >90 billion words of text (including >82 billion words of de-identified clinical text) and systematically evaluate it on five clinical NLP tasks including clinical concept extraction, medical relation extraction, semantic textual similarity, natural language inference (NLI), and medical question answering (MQA). We examine how (1) scaling up the number of parameters and (2) scaling up the size of the training data could benefit these NLP tasks. GatorTron models scale up the clinical language model from 110 million to 8.9 billion parameters and improve five clinical NLP tasks (e.g., 9.6% and 9.5% improvement in accuracy for NLI and MQA), which can be applied to medical AI systems to improve healthcare delivery. The GatorTron models are publicly available at: https://catalog.ngc.nvidia.com/orgs/nvidia/teams/clara/models/gatortron_og.

show abstract

Detection of Pediatric Respiratory and Diarrheal Outbreaks from Sales of Over-the-counter Electrolyte Products

Hogan

Tsui

Иванов

et al. 2003

J Am Med Inform Assoc

View full text Add to dashboard Cite

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

William R. Hogan

MySurgeryRisk: Development and Validation of a Machine-learning Risk Algorithm for Major Complications and Death After Surgery

Accuracy of Data in Computer-based Patient Records

Algorithms for rapid outbreak detection: a research synthesis

Natural Language Processing methods and systems for biomedical ontology learning

Extracting social determinants of health from electronic health records using natural language processing: a systematic review

The Accuracy of Medication Data in an Outpatient Electronic Medical Record

A large language model for electronic health records

Detection of Pediatric Respiratory and Diarrheal Outbreaks from Sales of Over-the-counter Electrolyte Products

Contact Info

Product

Resources

About