Background
The competence of ChatGPT (Chat Generative Pre-Trained Transformer) in non-English languages is not well studied.
Objective
This study compared the performances of GPT-3.5 (Generative Pre-trained Transformer) and GPT-4 on the Japanese Medical Licensing Examination (JMLE) to evaluate the reliability of these models for clinical reasoning and medical knowledge in non-English languages.
Methods
This study used the default mode of ChatGPT, which is based on GPT-3.5; the GPT-4 model of ChatGPT Plus; and the 117th JMLE in 2023. A total of 254 questions were included in the final analysis, which were categorized into 3 types, namely general, clinical, and clinical sentence questions.
Results
The results indicated that GPT-4 outperformed GPT-3.5 in terms of accuracy, particularly for general, clinical, and clinical sentence questions. GPT-4 also performed better on difficult questions and specific disease questions. Furthermore, GPT-4 achieved the passing criteria for the JMLE, indicating its reliability for clinical reasoning and medical knowledge in non-English languages.
Conclusions
GPT-4 could become a valuable tool for medical education and clinical support in non–English-speaking regions, such as Japan.
BACKGROUND
Background: ChatGPT’s competence in non-English languages is not well studied.
OBJECTIVE
Objective: Thus, this study compares the performance of ChatGPT and GPT-4 in the Japanese Medical Licensing Examination (JMLE) to evaluate the reliability of these models in clinical reasoning and medical knowledge in non-English languages.
METHODS
Methods: The study used the default mode of ChatGPT, based on GPT-3.5, the GPT-4 model of ChatGPT plus, and the 2022 JMLE, No. 117. A total of 254 questions were included in the final analysis, which were categorized into three types, namely general, clinical, and clinical sentence questions.
RESULTS
Results: The results showed that GPT-4 outperformed ChatGPT in terms of accuracy, particularly for general clinical and clinical sentence questions. GPT-4 also performed better on difficult questions and specific disease questions. Furthermore, GPT-4 achieved the passing criteria for JMLE, indicating its reliability for clinical reasoning and medical knowledge in non-English languages.
CONCLUSIONS
Conclusions: GPT-4 could become a valuable tool for medical education and clinical support in non-English-speaking regions, such as Japan.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.