2022
DOI: 10.1101/2022.12.19.22283643
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Performance of ChatGPT on USMLE: Potential for AI-Assisted Medical Education Using Large Language Models

Abstract: We evaluated the performance of a large language model called ChatGPT on the United States Medical Licensing Exam (USMLE), which consists of three exams: Step 1, Step 2CK, and Step 3. ChatGPT performed at or near the passing threshold for all three exams without any specialized training or reinforcement. Additionally, ChatGPT demonstrated a high level of concordance and insight in its explanations. These results suggest that large language models may have the potential to assist with medical education, and pot… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

2
118
2
2

Year Published

2022
2022
2023
2023

Publication Types

Select...
6
2

Relationship

0
8

Authors

Journals

citations
Cited by 176 publications
(170 citation statements)
references
References 19 publications
(12 reference statements)
2
118
2
2
Order By: Relevance
“…They see "potential applications of ChatGPT as a medical education tool" (Gilson et al, 2022). Kung et al (2022) also tested ChatGPT on the USMLE and arrived at similar results and conclusions. Bommarito & Katz (2022) found earlier that GPT-3 was able to pass a U.S. Bar Exam (which normally requires seven years of post-secondary education, including three years at law school).…”
Section: Methods and Literature Reviewmentioning
confidence: 70%
See 1 more Smart Citation
“…They see "potential applications of ChatGPT as a medical education tool" (Gilson et al, 2022). Kung et al (2022) also tested ChatGPT on the USMLE and arrived at similar results and conclusions. Bommarito & Katz (2022) found earlier that GPT-3 was able to pass a U.S. Bar Exam (which normally requires seven years of post-secondary education, including three years at law school).…”
Section: Methods and Literature Reviewmentioning
confidence: 70%
“…Only a fraction of these random tests is discussed in the next section. Unlike other recent academic articles and editorials (King & ChatGPT, 2023;Kung et al, 2022;O'Connor & ChatGPT, 2023), ChatGPT is not a co-author of our article, and we used the chatbot only very sparingly for brainstorming.…”
Section: Methods and Literature Reviewmentioning
confidence: 99%
“…We suggest clear disclosure when a manuscript is written with assistance from ChatGPT; 26 some have even included it as a co-author. 27 Reassuringly, there are patterns that allow it to be detected by AI output detectors. Though there is ongoing work to embed watermarks in output, until this is standardized and robust against scrubbing, we suggest running journal and conference abstract submissions through AI output detectors as part of the research editorial process to protect from targeting by organizations such as paper mills.…”
Section: Discussionmentioning
confidence: 99%
“…25 We suggest clear disclosure when a manuscript is written with assistance from ChatGPT; 26 some have even included it as a co-author. 27 Reassuringly, there are patterns that allow it to be detected by AI output detectors.…”
Section: Discussionmentioning
confidence: 99%
“…For example, when given a mixture of original and ChatGPT-generated medical scientific abstracts, blinded medical researchers could identify only 68% of the ChatGPTgenerated abstracts as fabricated [6]. Other research evaluated the performance of ChatGPT on the United States Medical Licensing Exam (USMLE) and found that the tool performed near or at the passing threshold [7]. Taken together, there is anecdotal evidence that ChatGPT generates content that is very similar to and can hardly be discriminated from human-generated content.…”
Section: Related Literaturementioning
confidence: 99%