Betina Idnay scite author profile

Dreisbach

Weng

et al. 2021

Objective We conducted a systematic review to assess the effect of natural language processing (NLP) systems in improving the accuracy and efficiency of eligibility prescreening during the clinical research recruitment process. Materials and Methods Guided by the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) standards of quality for reporting systematic reviews, a protocol for study eligibility was developed a priori and registered in the PROSPERO database. Using predetermined inclusion criteria, studies published from database inception through February 2021 were identified from 5 databases. The Joanna Briggs Institute Critical Appraisal Checklist for Quasi-experimental Studies was adapted to determine the study quality and the risk of bias of the included articles. Results Eleven studies representing 8 unique NLP systems met the inclusion criteria. These studies demonstrated moderate study quality and exhibited heterogeneity in the study design, setting, and intervention type. All 11 studies evaluated the NLP system’s performance for identifying eligible participants; 7 studies evaluated the system’s impact on time efficiency; 4 studies evaluated the system’s impact on workload; and 2 studies evaluated the system’s impact on recruitment. Discussion NLP systems in clinical research eligibility prescreening are an understudied but promising field that requires further research to assess its impact on real-world adoption. Future studies should be centered on continuing to develop and evaluate relevant NLP systems to improve enrollment into clinical studies. Conclusion Understanding the role of NLP systems in improving eligibility prescreening is critical to the advancement of clinical research recruitment.

Evaluating Large Language Models on Medical Evidence Summarization

Tang

Sun

et al. 2023

Preprint

Recent advances in large language models (LLMs) have demonstrated remarkable successes in zero- and few-shot performance on various downstream tasks, paving the way for applications in high-stakes domains. In this study, we systematically examine the capabilities and limitations of LLMs, specifically GPT-3.5 and ChatGPT, in performing zero-shot medical evidence summarization across six clinical domains. We conduct both automatic and human evaluations, covering several dimensions of summary quality. Our study has demonstrated that automatic metrics often do not strongly correlate with the quality of summaries. Furthermore, informed by our human evaluations, we define a terminology of error types for medical evidence summarization. Our findings reveal that LLMs could be susceptible to generating factually inconsistent summaries and making overly convincing or uncertain statements, leading to potential harm due to misinformation. Moreover, we find that models struggle to identify the salient information and are more error-prone when summarizing over longer textual contexts.

A qualitative analysis of stigmatizing language in birth admission clinical notes

et al. 2023

The presence of stigmatizing language in the electronic health record (EHR) has been used to measure implicit biases that underlie health inequities. The purpose of this study was to identify the presence of stigmatizing language in the clinical notes of pregnant people during the birth admission. We conducted a qualitative analysis on N = 1117 birth admission EHR notes from two urban hospitals in 2017. We identified stigmatizing language categories, such as Disapproval (39.3%), Questioning patient credibility (37.7%), Difficult patient (21.3%), Stereotyping (1.6%), and Unilateral decisions (1.6%) in 61 notes (5.4%). We also defined a new stigmatizing language category indicating Power/privilege. This was present in 37 notes (3.3%) and signaled approval of social status, upholding a hierarchy of bias. The stigmatizing language was most frequently identified in birth admission triage notes (16%) and least frequently in social work initial assessments (13.7%). We found that clinicians from various disciplines recorded stigmatizing language in the medical records of birthing people. This language was used to question birthing people's credibility and convey disapproval of decision-making abilities for themselves or their newborns.We reported a Power/privilege language bias in the inconsistent documentation of traits considered favorable for patient outcomes (e.g., employment status). Future work on stigmatizing language may inform tailored interventions to improve perinatal outcomes for all birthing people and their families.

Combining human and machine intelligence for clinical trial eligibility querying

Fang

Sun

et al. 2022

Objective To combine machine efficiency and human intelligence for converting complex clinical trial eligibility criteria text into cohort queries. Materials and Methods Criteria2Query (C2Q) 2.0 was developed to enable real-time user intervention for criteria selection and simplification, parsing error correction, and concept mapping. The accuracy, precision, recall, and F1 score of enhanced modules for negation scope detection, temporal and value normalization were evaluated using a previously curated gold standard, the annotated eligibility criteria of 1010 COVID-19 clinical trials. The usability and usefulness were evaluated by 10 research coordinators in a task-oriented usability evaluation using 5 Alzheimer’s disease trials. Data were collected by user interaction logging, a demographic questionnaire, the Health Information Technology Usability Evaluation Scale (Health-ITUES), and a feature-specific questionnaire. Results The accuracies of negation scope detection, temporal and value normalization were 0.924, 0.916, and 0.966, respectively. C2Q 2.0 achieved a moderate usability score (3.84 out of 5) and a high learnability score (4.54 out of 5). On average, 9.9 modifications were made for a clinical study. Experienced researchers made more modifications than novice researchers. The most frequent modification was deletion (5.35 per study). Furthermore, the evaluators favored cohort queries resulting from modifications (score 4.1 out of 5) and the user engagement features (score 4.3 out of 5). Discussion and Conclusion Features to engage domain experts and to overcome the limitations in automated machine output are shown to be useful and user-friendly. We concluded that human–computer collaboration is key to improving the adoption and user-friendliness of natural language processing.

Participatory Design of a Clinical Trial Eligibility Criteria Simplification Method

Fang¹,

Kim²,

et al. 2021

Clinical trial eligibility criteria are important for selecting the right participants for clinical trials. However, they are often complex and not computable. This paper presents the participatory design of a human-computer collaboration method for criteria simplification that includes natural language processing followed by user-centered eligibility criteria simplification. A case study on the ARCADIA trial shows how criteria were simplified for structured database querying by clinical researchers and identifies rules for criteria simplification and concept normalization.