Identifying incarceration status in the electronic health record using large language models in emergency department settings

Huang, Thomas; Socrates, Vimig; Gilson, Aidan; Safranek, Conrad; Chi, Ling; Wang, Emily A.; Puglisi, Lisa B.; Brandt, Cynthia; Taylor, R. Andrew; Wang, Karen

doi:10.1017/cts.2024.496

J. Clin. Trans. Sci.

2024

DOI: 10.1017/cts.2024.496

|View full text |Cite

Identifying incarceration status in the electronic health record using large language models in emergency department settings

Thomas Huang,

Vimig Socrates,

Aidan Gilson

et al.

Abstract: Identifying incarceration status in the electronic health record using large language models in emergency department settings.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

Supporting

Mentioning

Contrasting

Year Published

2024

Publication Types

Select...

Preprint2

Relationship

Self Cite0

Independent2

Authors

Journals

Cited by 2 publications

References 28 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

Generative Large Language Models in Electronic Health Records for Patient Care Since 2023: A Systematic Review

Du,

Wang,

Zhou

et al. 2024

Preprint

View full text Add to dashboard Cite

BackgroundGenerative Large language models (LLMs) represent a significant advancement in natural language processing, achieving state-of-the-art performance across various tasks. However, their application in clinical settings using real electronic health records (EHRs) is still rare and presents numerous challenges.ObjectiveThis study aims to systematically review the use of generative LLMs in patient care-related topics involving EHRs, summarize the challenges faced, and suggest future directions.MethodsA Boolean search for peer-reviewed articles was conducted in May 2024 using PubMed and Web of Science to include research articles published since 2023, which was one month after the release of ChatGPT. The search results were deduplicated. Multiple reviewers, including biomedical informaticians, computer scientists, and a physician, screened the publications for eligibility and extracted bibliometric and clinically relevant information. Only papers utilizing generative LLMs to analyze real EHR data were included. We summarized the use of prompt engineering, fine-tuning, multimodal EHR data, and evaluation matrices. Additionally, we identified current challenges in applying LLMs in clinical settings as reported by the included papers and proposed future directions.ResultsThe initial search identified 6,328 unique studies, with 76 studies included after eligibility screening. Of these, 67 studies (88.2%) employed zero-shot prompting, five of them reported 100% accuracy on five specific clinical tasks. Nine studies used advanced prompting strategies; four tested these strategies experimentally, finding that prompt engineering improved performance, with one study noting a non-linear relationship between the number of examples in a prompt and performance improvement. Eight studies explored fine-tuning generative LLMs, all reported performance improvements on specific tasks, but three of them noted potential performance degradation after fine-tuning on certain tasks. Only two studies utilized multimodal data, which improved LLM-based decision-making and enabled accurate rare disease diagnosis and prognosis. The studies employed 55 different evaluation metrics for 22 purposes, such as correctness, completeness, and conciseness. Two studies investigated LLM bias, with one detecting no bias and the other finding that male patients received more appropriate clinical decision-making suggestions. Six studies identified hallucinations, such as fabricating patient names in structured thyroid ultrasound reports. Additional challenges included but not limited to the impersonal tone of LLM consultations, which made patients uncomfortable, and the difficulty patients had in understanding LLM responses.ConclusionOur review indicates that few studies have employed advanced computational techniques to enhance LLM performance. The diverse evaluation metrics used highlight the need for standardization. LLMs currently cannot replace physicians due to challenges such as bias, hallucinations, and impersonal responses.

show abstract

Generative Large Language Models in Electronic Health Records for Patient Care Since 2023: A Systematic Review

Du,

Wang,

Zhou

et al. 2024

Preprint

View full text Add to dashboard Cite

show abstract

Optimizing Clinical Data Availability: Extracting Pulmonary Embolism Diagnoses from Radiology Impressions with GPT-4o

Mahyoub,

Dougherty,

Shukla

2024

Preprint

View full text Add to dashboard Cite

Background: Pulmonary embolism (PE) is a life-threatening condition that requires timely diagnosis to reduce mortality. Radiology reports, particularly the Impression sections, play a critical role in diagnosing PE. However, manually extracting this information from large volumes of reports is challenging. This study aims to develop an advanced natural language processing (NLP) system using GPT-4o to automatically extract PE diagnoses from radiology report impressions, enhancing clinical workflows and decision-making. Materials and Methods: We developed two text classification models: a fine-tuned Clinical Longformer (as a baseline model) and GPT-4o. Models were trained using 1,000 radiology report impressions and validated on 200 samples, with a post-deployment evaluation conducted using 500 operational records. The primary dataset was sourced from an electronic medical record relational database, and key metrics such as sensitivity, specificity, and F1 score were used to evaluate model performance. Results: GPT-4o achieved superior performance with 100% sensitivity, specificity, and F1 score, outperforming the Clinical Longformer. Post-deployment, GPT-4o continued to perform flawlessly, identifying all positive PE cases without false positives or false negatives. The model successfully streamlined the clinical workflow, reducing the burden of manual review and enhancing diagnostic accuracy.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Identifying incarceration status in the electronic health record using large language models in emergency department settings

Abstract: Identifying incarceration status in the electronic health record using large language models in emergency department settings.

Cited by 2 publications

References 28 publications

Generative Large Language Models in Electronic Health Records for Patient Care Since 2023: A Systematic Review

Generative Large Language Models in Electronic Health Records for Patient Care Since 2023: A Systematic Review

Optimizing Clinical Data Availability: Extracting Pulmonary Embolism Diagnoses from Radiology Impressions with GPT-4o

Contact Info

Product

Resources

About