We constructed an automated predictive analytics framework for machine-learning algorithm with high discriminatory ability for assessing the risk of surgical complications and death using readily available preoperative electronic health records data. The feasibility of this novel algorithm implemented in real time clinical workflow requires further testing.
Data in computer-based patient records (CPRs) have many uses beyond their primary role in patient care, including research and health-system management. Although the accuracy of CPR data directly affects these applications, there has been only sporadic interest in, and no previous review of, data accuracy in CPRs. This paper reviews the published studies of data accuracy in CPRs. These studies report highly variable levels of accuracy. This variability stems from differences in study design, in types of data studied, and in the CPRs themselves. These differences confound interpretation of this literature. We conclude that our knowledge of data accuracy in CPRs is not commensurate with its importance and further studies are needed. We propose methodological guidelines for studying accuracy that address shortcomings of the current literature. As CPR data are used increasingly for research, methods used in research databases to continuously monitor and improve accuracy should be applied to CPRs.
The threat of bioterrorism has stimulated interest in enhancing public health surveillance to detect disease outbreaks more rapidly than is currently possible. To advance research on improving the timeliness of outbreak detection, the Defense Advanced Research Project Agency sponsored the Bio-event Advanced Leading Indicator Recognition Technology (BioALIRT) project beginning in 2001. The purpose of this paper is to provide a synthesis of research on outbreak detection algorithms conducted by academic and industrial partners in the BioALIRT project. We first suggest a practical classification for outbreak detection algorithms that considers the types of information encountered in surveillance analysis. We then present a synthesis of our research according to this classification. The research conducted for this project has examined how to use spatial and other covariate information from disparate sources to improve the timeliness of outbreak detection. Our results suggest that use of spatial and other covariate information can improve outbreak detection performance. We also identified, however, methodological challenges that limited our ability to determine the benefit of using outbreak detection algorithms that operate on large volumes of data. Future research must address challenges such as forecasting expected values in high-dimensional data and generating spatial and multivariate test data sets.
While the biomedical informatics community widely acknowledges the utility of domain ontologies, there remain many barriers to their effective use. One important requirement of domain ontologies is that they must achieve a high degree of coverage of the domain concepts and concept relationships. However, the development of these ontologies is typically a manual, time-consuming, and often error-prone process. Limited resources result in missing concepts and relationships as well as difficulty in updating the ontology as knowledge changes. Methodologies developed in the fields of natural language processing, information extraction, information retrieval and machine learning provide techniques for automating the enrichment of an ontology from free-text documents. In this article, we review existing methodologies and developed systems, and discuss how existing methods can benefit the development of biomedical ontologies.
Objective Social determinants of health (SDoH) are nonclinical dispositions that impact patient health risks and clinical outcomes. Leveraging SDoH in clinical decision-making can potentially improve diagnosis, treatment planning, and patient outcomes. Despite increased interest in capturing SDoH in electronic health records (EHRs), such information is typically locked in unstructured clinical notes. Natural language processing (NLP) is the key technology to extract SDoH information from clinical text and expand its utility in patient care and research. This article presents a systematic review of the state-of-the-art NLP approaches and tools that focus on identifying and extracting SDoH data from unstructured clinical text in EHRs. Materials and Methods A broad literature search was conducted in February 2021 using 3 scholarly databases (ACL Anthology, PubMed, and Scopus) following Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. A total of 6402 publications were initially identified, and after applying the study inclusion criteria, 82 publications were selected for the final review. Results Smoking status (n = 27), substance use (n = 21), homelessness (n = 20), and alcohol use (n = 15) are the most frequently studied SDoH categories. Homelessness (n = 7) and other less-studied SDoH (eg, education, financial problems, social isolation and support, family problems) are mostly identified using rule-based approaches. In contrast, machine learning approaches are popular for identifying smoking status (n = 13), substance use (n = 9), and alcohol use (n = 9). Conclusion NLP offers significant potential to extract SDoH data from narrative clinical notes, which in turn can aid in the development of screening tools, risk prediction models, and clinical decision support systems.
Medication records in an outpatient EMR may have significant levels of data error. Based on an analysis of correctable causes of error, the authors conclude that the most effective extension to the EMR studied would be to expand its scope to include all clinicians who can potentially change medications. Even with EMR extensions, however, ineradicable error due to patients and data entry will remain. Several implications of ineradicable error for MDSSs are discussed. The provision of a free-text "comments" field increased the accuracy of medication lists for clinician users at the expense of accuracy for a MDSS.
There is an increasing interest in developing artificial intelligence (AI) systems to process and interpret electronic health records (EHRs). Natural language processing (NLP) powered by pretrained language models is the key technology for medical AI systems utilizing clinical narratives. However, there are few clinical language models, the largest of which trained in the clinical domain is comparatively small at 110 million parameters (compared with billions of parameters in the general domain). It is not clear how large clinical language models with billions of parameters can help medical AI systems utilize unstructured EHRs. In this study, we develop from scratch a large clinical language model—GatorTron—using >90 billion words of text (including >82 billion words of de-identified clinical text) and systematically evaluate it on five clinical NLP tasks including clinical concept extraction, medical relation extraction, semantic textual similarity, natural language inference (NLI), and medical question answering (MQA). We examine how (1) scaling up the number of parameters and (2) scaling up the size of the training data could benefit these NLP tasks. GatorTron models scale up the clinical language model from 110 million to 8.9 billion parameters and improve five clinical NLP tasks (e.g., 9.6% and 9.5% improvement in accuracy for NLI and MQA), which can be applied to medical AI systems to improve healthcare delivery. The GatorTron models are publicly available at: https://catalog.ngc.nvidia.com/orgs/nvidia/teams/clara/models/gatortron_og.
Sales of electrolyte products contain a signal of outbreaks of respiratory and diarrheal diseases in children and usually are an earlier signal than hospital diagnoses.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.