2024
DOI: 10.1101/2024.03.21.24304667
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Evaluating Accuracy and Reproducibility of Large Language Model Performance in Pharmacy Education

Amoreena Most,
Mengxuan Hu,
Huibo Yang
et al.

Abstract: The purpose of this study was to compare performance of ChatGPT (GPT-3.5), ChatGPT (GPT-4), Claude2, Llama2-7b, and Llama2-13b on 219 multiple-choice questions focusing on critical care pharmacotherapy. To further assess the ability of engineering LLMs to improve reasoning abilities and performance, we examined responses with a zero-shot Chain-of-Thought (CoT) approach, CoT prompting, and a custom built GPT (PharmacyGPT). A 219 multiple-choice questions focused on critical care pharmacotherapy topics used in D… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Publication Types

Select...

Relationship

0
0

Authors

Journals

citations
Cited by 0 publications
references
References 19 publications
0
0
0
Order By: Relevance

No citations

Set email alert for when this publication receives citations?