2023
DOI: 10.2196/48002
|View full text |Cite
|
Sign up to set email alerts
|

Performance of GPT-3.5 and GPT-4 on the Japanese Medical Licensing Examination: Comparison Study

Abstract: Background The competence of ChatGPT (Chat Generative Pre-Trained Transformer) in non-English languages is not well studied. Objective This study compared the performances of GPT-3.5 (Generative Pre-trained Transformer) and GPT-4 on the Japanese Medical Licensing Examination (JMLE) to evaluate the reliability of these models for clinical reasoning and medical knowledge in non-English languages. Methods This … Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

8
54
1

Year Published

2023
2023
2024
2024

Publication Types

Select...
6
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 122 publications
(97 citation statements)
references
References 23 publications
8
54
1
Order By: Relevance
“…Recent studies show GPT-4 outperformed GPT-3.5 by 24%–30% in various medical examinations. 13,14,21,23 These findings indicate a significant enhancement in the model's capabilities. However, a study using the American College of Gastroenterology Test found GPT-3.5 and GPT-4 had scores of 65% and 62%, respectively.…”
Section: Discussionmentioning
confidence: 90%
See 2 more Smart Citations
“…Recent studies show GPT-4 outperformed GPT-3.5 by 24%–30% in various medical examinations. 13,14,21,23 These findings indicate a significant enhancement in the model's capabilities. However, a study using the American College of Gastroenterology Test found GPT-3.5 and GPT-4 had scores of 65% and 62%, respectively.…”
Section: Discussionmentioning
confidence: 90%
“…Yet, the AI model struggles with more complex tasks requiring advanced comprehension, analytical abilities, and precise calculations. As indicated by a number of studies, 16,[20][21][22] ChatGPT's limitations in handling scientific and mathematical applications, particularly those demanding high-level cognitive engagement, become evident. Fluctuations in accuracy may be linked to the nature of subfield questions, even without explicit categorization.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…GPT can also understand languages other than English. The latest model, GPT-4, has been reported to achieve passing scores in medical licensing examinations in non-English speaking countries such as Japan, China, Poland, and Peru [8][9][10][11][12][13].…”
Section: Introductionmentioning
confidence: 99%
“…Dozens of articles followed in a short time, focusing on the national medical licensing examinations of various countries and the board examinations of various specialties. 8,9…”
mentioning
confidence: 99%