GPT-4 outperforms ChatGPT in answering non-English questions related to cirrhosis

Yeo, Yee Hui; Samaan, Jamil S.; Ng, Wee Han; Ma, Xiaoyan; Ting, Peng–Sheng; Kwak, Min‐Sun; Panduro, Arturo; Lizaola-Mayo, Blanca; Trivedi, Hirsh; Vipani, Aarshi; Ayoub, Walid; Yang, Ju Dong; Liran, Omer; Spiegel, Brennan; Kuo, Alexander

doi:10.1101/2023.05.04.23289482

Cited by 17 publications

(16 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

Section: Discussionmentioning

confidence: 99%

“…14 Another study attempted the family medicine exam using ChatGPT and was successful, partly because the questions used were from an after-school exam for medical students and likely more accessible. 7 Second, ChatGPT’s language database training may not contain enough information in traditional Chinese, 15,16 leading to reduced accuracy and correctness in the answers when only Chinese questions are presented. Third, the Family Medicine Board Exam questions are mainly from the TAFM’s publications, including Chinese family medicine magazines and three major textbooks in the new 2023 edition ( Family Medicine , Community Medicine , and Family Doctor Clinical Practice , all in Chinese), which are not open access and not likely included in ChatGPT’s training database; this hinders the search for the most accurate answers.…”

Section: Discussionmentioning

confidence: 99%

See 1 more Smart Citation

ChatGPT failed Taiwan’s Family Medicine Board Exam

Weng

Wang

Chang

et al. 2023

Journal of the Chinese Medical Association

View full text Add to dashboard Cite

Background: Chat Generative Pre-trained Transformer (ChatGPT), OpenAI Limited Partnership, San Francisco, CA, USA is an artificial intelligence language model gaining popularity because of its large database and ability to interpret and respond to various queries. Although it has been tested by researchers in different fields, its performance varies depending on the domain. We aimed to further test its ability in the medical field. Methods: We used questions from Taiwan’s 2022 Family Medicine Board Exam, which combined both Chinese and English and covered various question types, including reverse questions and multiple-choice questions, and mainly focused on general medical knowledge. We pasted each question into ChatGPT and recorded its response, comparing it to the correct answer provided by the exam board. We used SAS 9.4 (Cary, North Carolina, USA) and Excel to calculate the accuracy rates for each question type. Results: ChatGPT answered 52 questions out of 125 correctly, with an accuracy rate of 41.6%. The questions’ length did not affect the accuracy rates. These were 45.5%, 33.3%, 58.3%, 50.0%, and 43.5% for negative-phrase questions, multiple-choice questions, mutually exclusive options, case scenario questions, and Taiwan’s local policy-related questions, with no statistical difference observed. Conclusion: ChatGPT’s accuracy rate was not good enough for Taiwan’s Family Medicine Board Exam. Possible reasons include the difficulty level of the specialist exam and the relatively weak database of traditional Chinese language resources. However, ChatGPT performed acceptably in negative-phrase questions, mutually exclusive questions, and case scenario questions, and it can be a helpful tool for learning and exam preparation. Future research can explore ways to improve ChatGPT’s accuracy rate for specialized exams and other domains.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Discussionmentioning

confidence: 99%

ChatGPT failed Taiwan’s Family Medicine Board Exam

Weng

Wang

Chang

et al. 2023

Journal of the Chinese Medical Association

View full text Add to dashboard Cite

show abstract

“…The authors were primarily affiliated with institutions in the United States (n=47 of 122 different countries identified per publication, 38.5%), followed by Germany (n=11/122, 9%), Turkey (n=7/122, 5.7%), the United Kingdom (n=6/122, 4.9%), China/Australia/Italy (n=5/122, 4.1%, respectively), and 24 (n=36/122, 29.5%) other countries. Most studies examined one or more applications based on the GPT-3.5 architecture (n=66 of 124 different LLMs examined per study, 53.2%) 13,[26][27][28][29][31][32][33][34][36][37][38][39][40][42][43][44][45][46][47][48][49][52][53][54][56][57][58][59][60][61]63,[65][66][67]71,72,74,75,77,78,[81][82][83][84][85][86][87]...…”

Section: Characteristics Of Included Studiesmentioning

confidence: 99%

“…Most studies examined one or more applications based on the GPT-3.5 architecture (n=66 of 124 different LLMs examined per study, 53.2%) 13,[26][27][28][29][31][32][33][34][36][37][38][39][40][42][43][44][45][46][47][48][49][52][53][54][56][57][58][59][60][61]63,[65][66][67]71,72,74,75,77,78,[81][82][83][84][85][86][87][88][89]91,92,…”

Section: Characteristics Of Included Studiesmentioning

confidence: 99%

Systematic Review of Large Language Models for Patient Care: Current Applications and Challenges

Busch,

Hoffmann,

Rueger

et al. 2024

Preprint

View full text Add to dashboard Cite

The introduction of large language models (LLMs) into clinical practice promises to improve patient education and empowerment, thereby personalizing medical care and broadening access to medical knowledge. Despite the popularity of LLMs, there is a significant gap in systematized information on their use in patient care. Therefore, this systematic review aims to synthesize current applications and limitations of LLMs in patient care using a data-driven convergent synthesis approach. We searched 5 databases for qualitative, quantitative, and mixed methods articles on LLMs in patient care published between 2022 and 2023. From 4,349 initial records, 89 studies across 29 medical specialties were included, primarily examining models based on the GPT-3.5 (53.2%, n=66 of 124 different LLMs examined per study) and GPT-4 (26.6%, n=33/124) architectures in medical question answering, followed by patient information generation, including medical text summarization or translation, and clinical documentation. Our analysis delineates two primary domains of LLM limitations: design and output. Design limitations included 6 second-order and 12 third-order codes, such as lack of medical domain optimization, data transparency, and accessibility issues, while output limitations included 9 second-order and 32 third-order codes, for example, non-reproducibility, non-comprehensiveness, incorrectness, unsafety, and bias. In conclusion, this study is the first review to systematically map LLM applications and limitations in patient care, providing a foundational framework and taxonomy for their implementation and evaluation in healthcare settings.

show abstract

“…Moreover, ChatGPT-4 provides response with medical accuracy in different languages (English, Korean, Mandarin, Spanish and many others), reaching a larger population and avoiding the need for translator software that could misspell medical language and make errors while translating [34].…”

Section: Enhancing Patient-physician Interactionsmentioning

confidence: 99%

ChatGPT in urology practice: revolutionizing efficiency and patient care with generative artificial intelligence

Nedbal,

Naik,

Castellani

et al. 2023

Current Opinion in Urology

View full text Add to dashboard Cite

Purpose of review ChatGPT has emerged as a potentially useful tool for healthcare. Its role in urology is in its infancy and has much potential for research, clinical practice and for patient assistance. With this narrative review, we want to draw a picture of what is known about ChatGPT's integration in urology, alongside future promises and challenges. Recent findings The use of ChatGPT can ease the administrative work, helping urologists with note-taking and clinical documentation such as discharge summaries and clinical notes. It can improve patient engagement through increasing awareness and facilitating communication, as it has especially been investigated for uro-oncological diseases. Its ability to understand human emotions makes ChatGPT an empathic and thoughtful interactive tool or source for urological patients and their relatives. Currently, its role in clinical diagnosis and treatment decisions is uncertain, as concerns have been raised about misinterpretation, hallucination and out-of-date information. Moreover, a mandatory regulatory process for ChatGPT in urology is yet to be established. Summary ChatGPT has the potential to contribute to precision medicine and tailored practice by its quick, structured responses. However, this will depend on how well information can be obtained by seeking appropriate responses and asking the pertinent questions. The key lies in being able to validate the responses, regulating the information shared and avoiding misuse of the same to protect the data and patient privacy. Its successful integration into mainstream urology needs educational bodies to provide guidelines or best practice recommendations for the same.

show abstract

GPT-4 outperforms ChatGPT in answering non-English questions related to cirrhosis

Cited by 17 publications

References 14 publications

ChatGPT failed Taiwan’s Family Medicine Board Exam

ChatGPT failed Taiwan’s Family Medicine Board Exam

Systematic Review of Large Language Models for Patient Care: Current Applications and Challenges

ChatGPT in urology practice: revolutionizing efficiency and patient care with generative artificial intelligence

Contact Info

Product

Resources

About