ChatGPT failed Taiwan’s Family Medicine Board Exam

Weng, Tzu-Ling; Wang, Yingmei; Chang, Sun Ju; Chen, Tzeng Ji; Hwang, Shinn‐Jang

doi:10.1097/jcma.0000000000000946

Cited by 54 publications

(27 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…It did not pass the 2023 Japanese National Medical Licensing Examination with an overall correct answer rate of 55.0%. Furthermore, it did not succeed in the Taiwan Family Medicine Board Exam, 24 Taiwan internal medicine exams, 25 the Taiwan Pharmacist Licensing Examination, 21 Chinese Medical Licensing Examination, Chinese Pharmacist Licensing Examination, and Chinese Nurse Licensing Examination, 26 and the Chinese medical licensing exams in simplified Chinese. 17 Nevertheless, our results indicate that ChatGPT attained an accuracy of up to 93.75% in the Taiwan medical licensing exams, though there was a noticeable drop in performance in the July 2023 exam.…”

Section: Discussionmentioning

confidence: 98%

Exploring the proficiency of ChatGPT-4: An evaluation of its performance in the Taiwan advanced medical licensing examination

Lin,

Chan,

Hsu

et al. 2024

DIGITAL HEALTH

View full text Add to dashboard Cite

Background Taiwan is well-known for its quality healthcare system. The country's medical licensing exams offer a way to evaluate ChatGPT's medical proficiency. Methods We analyzed exam data from February 2022, July 2022, February 2023, and July 2033. Each exam included four papers with 80 single-choice questions, grouped as descriptive or picture-based. We used ChatGPT-4 for evaluation. Incorrect answers prompted a “chain of thought” approach. Accuracy rates were calculated as percentages. Results ChatGPT-4's accuracy in medical exams ranged from 63.75% to 93.75% (February 2022–July 2023). The highest accuracy (93.75%) was in February 2022's Medicine Exam (3). Subjects with the highest misanswered rates were ophthalmology (28.95%), breast surgery (27.27%), plastic surgery (26.67%), orthopedics (25.00%), and general surgery (24.59%). While using “chain of thought,” the “Accuracy of (CoT) prompting” ranged from 0.00% to 88.89%, and the final overall accuracy rate ranged from 90% to 98%. Conclusion ChatGPT-4 succeeded in Taiwan's medical licensing exams. With the “chain of thought” prompt, it improved accuracy to over 90%.

show abstract

Section: Discussionmentioning

confidence: 98%

Exploring the proficiency of ChatGPT-4: An evaluation of its performance in the Taiwan advanced medical licensing examination

Lin,

Chan,

Hsu

et al. 2024

DIGITAL HEALTH

View full text Add to dashboard Cite

show abstract

“…Therefore, it supported the fact that many attending staff and residents felt that the response by ChatGPT was superficial and did not show a deep understanding of the topic. For more advanced examination levels, such as resident-level examinations, ChatGPT performed more poorly [7,34,35]. For example, ChatGPT's score in the plastic surgery in-training examination was ranked at the 49th percentile compared with first-year residents but significantly worse than fifth-and sixth-year residents at the zeroth percentile [9].…”

Section: Discussionmentioning

confidence: 98%

Using ChatGPT for Clinical Practice and Medical Education: Cross-Sectional Survey of Medical Students’ and Physicians’ Perceptions

Tangadulrat,

Sono,

Tangtrakulwanich

2023

JMIR Med Educ

View full text Add to dashboard Cite

Background ChatGPT is a well-known large language model–based chatbot. It could be used in the medical field in many aspects. However, some physicians are still unfamiliar with ChatGPT and are concerned about its benefits and risks. Objective We aim to evaluate the perception of physicians and medical students toward using ChatGPT in the medical field. Methods A web-based questionnaire was sent to medical students, interns, residents, and attending staff with questions regarding their perception toward using ChatGPT in clinical practice and medical education. Participants were also asked to rate their perception of ChatGPT’s generated response about knee osteoarthritis. Results Participants included 124 medical students, 46 interns, 37 residents, and 32 attending staff. After reading ChatGPT’s response, 132 of the 239 (55.2%) participants had a positive rating about using ChatGPT for clinical practice. The proportion of positive answers was significantly lower in graduated physicians (48/115, 42%) compared with medical students (84/124, 68%; P<.001). Participants listed a lack of a patient-specific treatment plan, updated evidence, and a language barrier as ChatGPT’s pitfalls. Regarding using ChatGPT for medical education, the proportion of positive responses was also significantly lower in graduate physicians (71/115, 62%) compared to medical students (103/124, 83.1%; P<.001). Participants were concerned that ChatGPT’s response was too superficial, might lack scientific evidence, and might need expert verification. Conclusions Medical students generally had a positive perception of using ChatGPT for guiding treatment and medical education, whereas graduated doctors were more cautious in this regard. Nonetheless, both medical students and graduated doctors positively perceived using ChatGPT for creating patient educational materials.

show abstract

“…Nuanced discrepancies in grammar rules and other aspects between the Chinese and English languages might affect ChatGPT’s effectiveness when used with Chinese. The current performance is restricted by the corpus, and further optimization and adjustment are required [ 17 ]. Consequently, the findings of this study provide an incomplete representation of ChatGPT’s overall performance level.…”

Section: Discussionmentioning

confidence: 99%

Performance of ChatGPT on Chinese Master’s Degree Entrance Examination in Clinical Medicine

Li,

Bu,

Shahjalal

et al. 2024

PLoS ONE

View full text Add to dashboard Cite

Background ChatGPT is a large language model designed to generate responses based on a contextual understanding of user queries and requests. This study utilised the entrance examination for the Master of Clinical Medicine in Traditional Chinese Medicine to assesses the reliability and practicality of ChatGPT within the domain of medical education. Methods We selected 330 single and multiple-choice questions from the 2021 and 2022 Chinese Master of Clinical Medicine comprehensive examinations, which did not include any images or tables. To ensure the test’s accuracy and authenticity, we preserved the original format of the query and alternative test texts, without any modifications or explanations. Results Both ChatGPT3.5 and GPT-4 attained average scores surpassing the admission threshold. Noteworthy is that ChatGPT achieved the highest score in the Medical Humanities section, boasting a correct rate of 93.75%. However, it is worth noting that ChatGPT3.5 exhibited the lowest accuracy percentage of 37.5% in the Pathology division, while GPT-4 also displayed a relatively lower correctness percentage of 60.23% in the Biochemistry section. An analysis of sub-questions revealed that ChatGPT demonstrates superior performance in handling single-choice questions but performs poorly in multiple-choice questions. Conclusion ChatGPT exhibits a degree of medical knowledge and the capacity to aid in diagnosing and treating diseases. Nevertheless, enhancements are warranted to address its accuracy and reliability limitations. Imperatively, rigorous evaluation and oversight must accompany its utilization, accompanied by proactive measures to surmount prevailing constraints.

show abstract

ChatGPT failed Taiwan’s Family Medicine Board Exam

Cited by 54 publications

References 20 publications

Exploring the proficiency of ChatGPT-4: An evaluation of its performance in the Taiwan advanced medical licensing examination

Exploring the proficiency of ChatGPT-4: An evaluation of its performance in the Taiwan advanced medical licensing examination

Using ChatGPT for Clinical Practice and Medical Education: Cross-Sectional Survey of Medical Students’ and Physicians’ Perceptions

Performance of ChatGPT on Chinese Master’s Degree Entrance Examination in Clinical Medicine

Contact Info

Product

Resources

About