2024
DOI: 10.1177/23821205241238641
|View full text |Cite
|
Sign up to set email alerts
|

Can ChatGPT-3.5 Pass a Medical Exam? A Systematic Review of ChatGPT's Performance in Academic Testing

Anusha Sumbal,
Ramish Sumbal,
Alina Amir

Abstract: OBJECTIVE We, therefore, aim to conduct a systematic review to assess the academic potential of ChatGPT-3.5, along with its strengths and limitations when giving medical exams. METHOD Following PRISMA guidelines, a systemic search of the literature was performed using electronic databases PUBMED/MEDLINE, Google Scholar, and Cochrane. Articles from their inception till April 4, 2023, were queried. A formal narrative analysis was conducted by systematically arranging similarities and differences between individu… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
1
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 12 publications
(5 citation statements)
references
References 29 publications
0
1
0
Order By: Relevance
“…An extensive body of literature has found that LLMs, such as ChatGPT, can successfully pass medical examinations [28], even though with varying degrees of heterogeneity and variability [29], exhibiting strong abilities in explanation, reasoning, memory, and accuracy. On the other hand, LLMs struggle with image-based questions [30] and, in some circumstances, lack insight and critical thinking skills [31]. Some of the studies that exploit quizzes/vignettes/validated knowledge surveys [32,33] have quantified the fluency and accuracy of AI-based tools using validated and reliable instruments, like the "Artificial Intelligence Performance Instrument" (AIPI) [32].…”
Section:  the Quiz/vignette/knowledge Survey Paradigmmentioning
confidence: 99%
See 1 more Smart Citation
“…An extensive body of literature has found that LLMs, such as ChatGPT, can successfully pass medical examinations [28], even though with varying degrees of heterogeneity and variability [29], exhibiting strong abilities in explanation, reasoning, memory, and accuracy. On the other hand, LLMs struggle with image-based questions [30] and, in some circumstances, lack insight and critical thinking skills [31]. Some of the studies that exploit quizzes/vignettes/validated knowledge surveys [32,33] have quantified the fluency and accuracy of AI-based tools using validated and reliable instruments, like the "Artificial Intelligence Performance Instrument" (AIPI) [32].…”
Section:  the Quiz/vignette/knowledge Survey Paradigmmentioning
confidence: 99%
“…An extensive body of literature has found that LLMs such as ChatGPT can successfully pass medical examinations [ 28 ] although with varying degrees of heterogeneity and variability [ 29 ], exhibiting strong abilities in explanation, reasoning, memory, and accuracy. On the other hand, LLMs struggle with image-based questions [ 30 ] and, in some circumstances, lack insight and critical thinking skills [ 31 ].…”
Section: Implementing “Verification Paradigms”: a Comprehensive Evalu...mentioning
confidence: 99%
“…To the best of our knowledge, three systematic reviews have explored ChatGPT's performance in medical licensing exams [58][59][60].…”
Section: Literature Reviewmentioning
confidence: 99%
“…A study from Pakistan collected literature up to April 2023, focusing on the performance of GPT-3.5 in various medical licensing exams worldwide [59]. However, with the advent of the more advanced GPT-4, more studies have focused on GPT-4.…”
Section: Literature Reviewmentioning
confidence: 99%
“…[6][7][8][9][10][11][12][13] As a more objective and generalizable benchmark for performance, studies have also explored LLMs' impressive performance on standardized clinical examinations such as the United States Medical Licensing Examination (USMLE) and specialty examinations. [14][15][16][17][18][19][20][21] Advancements in the capabilities of LLMs, such as image recognition, have opened new avenues for innovation and research into their potential applications in clinical care. Furthermore, model prompting strategies, such as prompt engineering, few-shot learning, and retrieval augmented generation (RAG) have provided promise in enhancing the performance of generalist foundation models on science and general medical knowledge benchmarks.…”
Section: Introductionmentioning
confidence: 99%