2023
DOI: 10.3389/fmed.2023.1240915
|View full text |Cite
|
Sign up to set email alerts
|

Evaluating the performance of ChatGPT-4 on the United Kingdom Medical Licensing Assessment

U Hin Lai,
Keng Sam Wu,
Ting-Yu Hsu
et al.

Abstract: IntroductionRecent developments in artificial intelligence large language models (LLMs), such as ChatGPT, have allowed for the understanding and generation of human-like text. Studies have found LLMs abilities to perform well in various examinations including law, business and medicine. This study aims to evaluate the performance of ChatGPT in the United Kingdom Medical Licensing Assessment (UKMLA).MethodsTwo publicly available UKMLA papers consisting of 200 single-best-answer (SBA) questions were screened. Ni… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
17
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 16 publications
(17 citation statements)
references
References 32 publications
0
17
0
Order By: Relevance
“…Variability in ChatGPT performance across varying disciplines was shown in previous studies as follows. A recent study by Lai et al showed that ChatGPT-4 had an average score of 76.3% in the United Kingdom Medical Licensing Assessment, a national undergraduate medical exit exam (Lai et al, 2023). Importantly, the study revealed varied performance across medical specialties, with weaker results in gastrointestinal/hepatology, endocrine/metabolic, and clinical hematology domains as opposed to better performance in the mental health, cancer, and cardiovascular domains (Lai et al, 2023).…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…Variability in ChatGPT performance across varying disciplines was shown in previous studies as follows. A recent study by Lai et al showed that ChatGPT-4 had an average score of 76.3% in the United Kingdom Medical Licensing Assessment, a national undergraduate medical exit exam (Lai et al, 2023). Importantly, the study revealed varied performance across medical specialties, with weaker results in gastrointestinal/hepatology, endocrine/metabolic, and clinical hematology domains as opposed to better performance in the mental health, cancer, and cardiovascular domains (Lai et al, 2023).…”
Section: Discussionmentioning
confidence: 99%
“…A recent study by Lai et al showed that ChatGPT-4 had an average score of 76.3% in the United Kingdom Medical Licensing Assessment, a national undergraduate medical exit exam (Lai et al, 2023). Importantly, the study revealed varied performance across medical specialties, with weaker results in gastrointestinal/hepatology, endocrine/metabolic, and clinical hematology domains as opposed to better performance in the mental health, cancer, and cardiovascular domains (Lai et al, 2023). Additionally, a similar discrepancy in ChatGPT-4 performance across medical subjects (albeit lacking statistical significance) was noticed in a study by Gobira et al which utilized the 2022 Brazilian National Examination for Medical Degree Revalidation, with worse performance in preventive medicine (Gobira et al, 2023).…”
Section: Discussionmentioning
confidence: 99%
“… 19 ChatGPT performed well (76.3%) on the UKMLA. 20 In contrast, the performance of ChatGPT vs dental students on a medical microbiology MCQ exam found that ChatGPT 3.5 correctly answered 64 out of 80 MCQs (80%), scoring 80.5 out of 100 which was below the student average of 86.21 out of 100. 29…”
Section: Discussionmentioning
confidence: 93%
“…Extensive research has shown that ChatGPT, particularly its most recent version GPT-4, excels across various standardized tests. This includes the United States Medical Licensing Examination [ 22 , 23 , 24 , 25 ]; medical licensing tests from different countries [ 26 , 27 , 28 , 29 , 30 ]; and exams related to specific fields such as psychiatry [ 31 ], nursing [ 32 ], dentistry [ 33 ], pathology [ 34 ], pharmacy [ 35 ], urology [ 36 ], gastroenterology [ 37 ], parasitology [ 38 ], and ophthalmology [ 39 ]. Additionally, there is evidence of ChatGPT’s ability to create discharge summaries and operative reports [ 40 , 41 ], record patient histories of present illness [ 42 ], and enhance the documentation process for informed consent [ 43 ], although its effectiveness requires further improvement.…”
Section: Introductionmentioning
confidence: 99%