Xi Yang scite author profile

There is an increasing interest in developing artificial intelligence (AI) systems to process and interpret electronic health records (EHRs). Natural language processing (NLP) powered by pretrained language models is the key technology for medical AI systems utilizing clinical narratives. However, there are few clinical language models, the largest of which trained in the clinical domain is comparatively small at 110 million parameters (compared with billions of parameters in the general domain). It is not clear how large clinical language models with billions of parameters can help medical AI systems utilize unstructured EHRs. In this study, we develop from scratch a large clinical language model—GatorTron—using >90 billion words of text (including >82 billion words of de-identified clinical text) and systematically evaluate it on five clinical NLP tasks including clinical concept extraction, medical relation extraction, semantic textual similarity, natural language inference (NLI), and medical question answering (MQA). We examine how (1) scaling up the number of parameters and (2) scaling up the size of the training data could benefit these NLP tasks. GatorTron models scale up the clinical language model from 110 million to 8.9 billion parameters and improve five clinical NLP tasks (e.g., 9.6% and 9.5% improvement in accuracy for NLI and MQA), which can be applied to medical AI systems to improve healthcare delivery. The GatorTron models are publicly available at: https://catalog.ngc.nvidia.com/orgs/nvidia/teams/clara/models/gatortron_og.

show abstract

Extracting social determinants of health from electronic health records using natural language processing: a systematic review

Patra

Sharma

Vekaria

et al. 2021

101

View full text Add to dashboard Cite

Objective Social determinants of health (SDoH) are nonclinical dispositions that impact patient health risks and clinical outcomes. Leveraging SDoH in clinical decision-making can potentially improve diagnosis, treatment planning, and patient outcomes. Despite increased interest in capturing SDoH in electronic health records (EHRs), such information is typically locked in unstructured clinical notes. Natural language processing (NLP) is the key technology to extract SDoH information from clinical text and expand its utility in patient care and research. This article presents a systematic review of the state-of-the-art NLP approaches and tools that focus on identifying and extracting SDoH data from unstructured clinical text in EHRs. Materials and Methods A broad literature search was conducted in February 2021 using 3 scholarly databases (ACL Anthology, PubMed, and Scopus) following Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. A total of 6402 publications were initially identified, and after applying the study inclusion criteria, 82 publications were selected for the final review. Results Smoking status (n = 27), substance use (n = 21), homelessness (n = 20), and alcohol use (n = 15) are the most frequently studied SDoH categories. Homelessness (n = 7) and other less-studied SDoH (eg, education, financial problems, social isolation and support, family problems) are mostly identified using rule-based approaches. In contrast, machine learning approaches are popular for identifying smoking status (n = 13), substance use (n = 9), and alcohol use (n = 9). Conclusion NLP offers significant potential to extract SDoH data from narrative clinical notes, which in turn can aid in the development of screening tools, risk prediction models, and clinical decision support systems.

show abstract

A comparison of effects of scalp nerve block and local anesthetic infiltration on inflammatory response, hemodynamic response, and postoperative pain in patients undergoing craniotomy for cerebral aneurysms: a randomized controlled trial

Yang

et al. 2019

BMC Anesthesiol

View full text Add to dashboard Cite

Background The purpose of this study was to compare the effects of scalp nerve block (SNB) and local anesthetic infiltration (LA) with 0.75% ropivacaine on postoperative inflammatory response, intraoperative hemodynamic response, and postoperative pain control in patients undergoing craniotomy. Methods Fifty-seven patients were admitted for elective craniotomy for surgical clipping of a cerebral aneurysm. They were randomly divided into three groups: Group S (SNB with 15 mL of 0.75% ropivacaine), group I (LA with 15 mL of 0.75% ropivacaine) and group C (that only received routine intravenous analgesia). Pro-inflammatory cytokine levels in plasma for 72 h postoperatively, hemodynamic response to skin incision, and postoperative pain intensity were measured. Results The SNB with 0.75% ropivacaine not only decreased IL-6 levels in plasma 6 h after craniotomy but also decreased plasma CRP levels and increased plasma IL-10 levels 12 and 24 h after surgery compared to LA and routine analgesia. There were significant increases in mean arterial pressure 2 and 5 mins after the incision and during dura opening in Groups I and C compared with Group S. Group S had lower postoperative pain intensity, longer duration before the first dose of oxycodone, less consumption of oxycodone and lower incidence of PONV through 48 h postoperatively than Groups I and C. Conclusion Preoperative SNB attenuated inflammatory response to craniotomy for cerebral aneurysms, blunted the hemodynamic response to scalp incision, and controlled postoperative pain better than LA or routine analgesia. Trial registration Clinicaltrials.gov NCT03073889 (PI:Xi Yang; date of registration:08/03/2017).

show abstract

Combat COVID-19 infodemic using explainable natural language processing models

Ayoub

Yang

Zhou

2021

Information Processing & Management

View full text Add to dashboard Cite

Misinformation of COVID-19 is prevalent on social media as the pandemic unfolds, and the associated risks are extremely high. Thus, it is critical to detect and combat such misinformation. Recently, deep learning models using natural language processing techniques, such as BERT (Bidirectional Encoder Representations from Transformers), have achieved great successes in detecting misinformation. In this paper, we proposed an explainable natural language processing model based on DistilBERT and SHAP (Shapley Additive exPlanations) to combat misinformation about COVID-19 due to their efficiency and effectiveness. First, we collected a dataset of 984 claims about COVID-19 with fact checking. By augmenting the data using back-translation, we doubled the sample size of the dataset and the DistilBERT model was able to obtain good performance

show abstract

Clinical Trial Generalizability Assessment in the Big Data Era: A Review

Tang

Yang

et al. 2020

Clinical Translational Sci

View full text Add to dashboard Cite

Clinical studies, especially randomized, controlled trials, are essential for generating evidence for clinical practice. However, generalizability is a long‐standing concern when applying trial results to real‐world patients. Generalizability assessment is thus important, nevertheless, not consistently practiced. We performed a systematic review to understand the practice of generalizability assessment. We identified 187 relevant articles and systematically organized these studies in a taxonomy with three dimensions: (i) data availability (i.e., before or after trial ( a priori vs. a posteriori generalizability)); (ii) result outputs (i.e., score vs. nonscore); and (iii) populations of interest. We further reported disease areas, underrepresented subgroups, and types of data used to profile target populations. We observed an increasing trend of generalizability assessments, but < 30% of studies reported positive generalizability results. As a priori generalizability can be assessed using only study design information (primarily eligibility criteria), it gives investigators a golden opportunity to adjust the study design before the trial starts. Nevertheless, < 40% of the studies in our review assessed a priori generalizability. With the wide adoption of electronic health records systems, rich real‐world patient databases are increasingly available for generalizability assessment; however, informatics tools are lacking to support the adoption of generalizability assessment practice.

show abstract

Clinical concept extraction using transformers

Yang

Bian

Hogan

et al. 2020

View full text Add to dashboard Cite

Objective The goal of this study is to explore transformer-based models (eg, Bidirectional Encoder Representations from Transformers [BERT]) for clinical concept extraction and develop an open-source package with pretrained clinical models to facilitate concept extraction and other downstream natural language processing (NLP) tasks in the medical domain. Methods We systematically explored 4 widely used transformer-based architectures, including BERT, RoBERTa, ALBERT, and ELECTRA, for extracting various types of clinical concepts using 3 public datasets from the 2010 and 2012 i2b2 challenges and the 2018 n2c2 challenge. We examined general transformer models pretrained using general English corpora as well as clinical transformer models pretrained using a clinical corpus and compared them with a long short-term memory conditional random fields (LSTM-CRFs) mode as a baseline. Furthermore, we integrated the 4 clinical transformer-based models into an open-source package. Results and Conclusion The RoBERTa-MIMIC model achieved state-of-the-art performance on 3 public clinical concept extraction datasets with F1-scores of 0.8994, 0.8053, and 0.8907, respectively. Compared to the baseline LSTM-CRFs model, RoBERTa-MIMIC remarkably improved the F1-score by approximately 4% and 6% on the 2010 and 2012 i2b2 datasets. This study demonstrated the efficiency of transformer-based models for clinical concept extraction. Our methods and systems can be applied to other clinical tasks. The clinical transformer package with 4 pretrained clinical models is publicly available at https://github.com/uf-hobi-informatics-lab/ClinicalTransformerNER. We believe this package will improve current practice on clinical concept extraction and other tasks in the medical domain.

show abstract

Situational Awareness, Driver’s Trust in Automated Driving Systems and Secondary Task Performance

Petersen¹,

Robert²,

Yang³

et al. 2019

SAE Intl. J CAV

View full text Add to dashboard Cite

Driver assistance systems, also called automated driving systems, allow drivers to immerse themselves in non-driving-related tasks. Unfortunately, drivers may not trust the automated driving system, which prevents either handing over the driving task or fully focusing on the secondary task. We assert that enhancing situational awareness can increase a driver's trust in automation. Situational awareness should increase a driver's trust and lead to better secondary task performance. This study manipulated driversʼ situational awareness by providing them with different types of information: the control condition provided no information to the driver, the low condition provided a status update, while the high condition provided a status update and a suggested course of action. Data collected included measures of trust, trusting behavior, and task performance through surveys, eye-tracking, and heart rate data. Results show that situational awareness both promoted and moderated the impact of trust in the automated vehicle, leading to better secondary task performance. This result was evident in measures of self-reported trust and trusting behavior.

show abstract

Predicting driver takeover performance in conditionally automated driving

Zhou

Pulver

et al. 2020

Accident Analysis & Prevention

View full text Add to dashboard Cite

In conditionally automated driving, drivers have difficulty taking over control when requested. To address this challenge, we aimed to predict drivers' takeover performance before the issue of a takeover request (TOR) by analyzing drivers' physiological data and external environment data. We used data sets from two human-in-the-loop experiments, wherein drivers engaged in non-driving-related tasks (NDRTs) were requested to take over control from automated driving in various situations. Drivers' physiological data included heart rate indices, galvanic skin response indices, and eye-tracking metrics. Driving environment data included scenario type, traffic density, and TOR lead time. Drivers' takeover performance was categorized as good or bad according to their driving behaviors during the transition period and was treated as the ground truth. Using six machine learning methods, we found that the random forest classifier performed the best and was able to predict drivers' takeover performance when they were engaged in NDRTs with different levels of cognitive load. We recommended 3 s as the optimal time window to predict takeover performance using the random forest classifier, with an accuracy of 84.3% and an F1-score of 64.0%. Our findings have implications for the algorithm development of driver state detection and the design of adaptive in-vehicle alert systems in conditionally automated driving.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Xi Yang

A large language model for electronic health records

Extracting social determinants of health from electronic health records using natural language processing: a systematic review

A comparison of effects of scalp nerve block and local anesthetic infiltration on inflammatory response, hemodynamic response, and postoperative pain in patients undergoing craniotomy for cerebral aneurysms: a randomized controlled trial

Combat COVID-19 infodemic using explainable natural language processing models

Clinical Trial Generalizability Assessment in the Big Data Era: A Review

Clinical concept extraction using transformers

Situational Awareness, Driver’s Trust in Automated Driving Systems and Secondary Task Performance

Predicting driver takeover performance in conditionally automated driving

Contact Info

Product

Resources

About