Artificial Intelligence and Patient Education: Examining the Accuracy and Reproducibility of Responses to Nutrition Questions Related to Inflammatory Bowel Disease by GPT-4

Samaan, Jamil S.; Issokson, Kelly; Feldman, Erin; Fasulo, Christina; Ng, Wee Han; Rajeev, Nithya; Hollander, Barbara; Yeo, Yee Hui; Vasiliauskas, Eric

doi:10.1101/2023.10.28.23297723

Cited by 1 publication

(11 citation statements)

References 38 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Table 1 summarizes the characteristics of the analyzed studies, including their setting, results, and conclusions. One study (n=1/89, 1.1%) was published in 2022 24 , 84 (n=84/89, 94.4%) in 2023 13,25–107 , and 4 (n=4/89, 4.5%) in 2024 108–111 (all of which were peer-reviewed publications of preprints published in 2023). Most studies were quantitative non-randomized (n=84/89, 94.4%) 13,25–27,29–101,103,104,106,107,109–111 , 4 (n=4/89, 4.5%) 28,102,105,108 had a qualitative study design, and one (n=1/89, 1.1%) 24 was quantitative randomized according to the MMAT 2018 criteria.…”

Section: Resultsmentioning

confidence: 99%

“…The authors were primarily affiliated with institutions in the United States (n=47 of 122 different countries identified per publication, 38.5%), followed by Germany (n=11/122, 9%), Turkey (n=7/122, 5.7%), the United Kingdom (n=6/122, 4.9%), China/Australia/Italy (n=5/122, 4.1%, respectively), and 24 (n=36/122, 29.5%) other countries. Most studies examined one or more applications based on the GPT-3.5 architecture (n=66 of 124 different LLMs examined per study, 53.2%) 13,26–29,31–34,36–40,42–49,52–54,56–61,63,65–67,71,72,74,75,77,78,81–89,91,92,94,95,97–100,102–104,106–109,111 , followed by GPT-4 (n=33/124, 26.6%) 13,25,27,29,30,34–36,41,43,50,51,54,55,58,61,64,68–70,74,76,79–81,83,87,89,90,93,96,98,99,101,105 , Bard (n=10/124, 8.1%; now known as Gemini) 33,48,49,55,73,74,80,87,94,99 , Bing Chat (n=7/124, 5.7%; now Microsoft Copilot) 49,51,55,73,94,99,110 , and other applications based on Bidirectional Encoder Representations from Transformers (BERT; n=4/124, 3...…”

Section: Resultsmentioning

confidence: 99%

“…Most studies examined one or more applications based on the GPT-3.5 architecture (n=66 of 124 different LLMs examined per study, 53.2%) 13,26–29,31–34,36–40,42–49,52–54,56–61,63,65–67,71,72,74,75,77,78,81–89,91,92,94,95,97–100,102–104,106–109,111 , followed by GPT-4 (n=33/124, 26.6%) 13,25,27,29,30,34–36,41,43,50,51,54,55,58,61,64,68–70,74,76,79–81,83,87,89,90,93,96,98,99,101,105 , Bard (n=10/124, 8.1%; now known as Gemini) 33,48,49,55,73,74,80,87,94,99 , Bing Chat (n=7/124, 5.7%; now Microsoft Copilot) 49,51,55,73,94,99,110 , and other applications based on Bidirectional Encoder Representations from Transformers (BERT; n=4/124, 3.2%) 13,83,84 , Large Language Model Meta-AI (LLaMA; n=3/124, 2.4%) 55 , or Claude by Anthropic (n=1/124, 0.8%) 55 . The majority of applications were p...…”

Section: Resultsmentioning

confidence: 99%

“…Most reports evaluated LLMs in English (n=88/89, 98.9%) 13,24–103,105–111 , followed by Arabic (n=2/84, 2.3%) 32,104 , Mandarin (n=2/84, 2.3%) 36,75 , and Korean or Spanish (n=1/89, 1.1%, respectively) 75 . The top-five specialties studied were ophthalmology (n=10/89, 11.2%) 37,40,48,51,65,74,97,98,100,101 , gastro-enterology (n=9/89, 10.1%) 25,32,34,36,39,61,62,72,96 , head and neck surgery/otolaryngology (n=8/89, 9%) 35,42,56,64,66,76,78,79 , and radiology 59,70,88–90,110 or plastic surgery 45,47,49,102,107,108 (n=6/89, 6.7%, respectively). A schematic illustration of the identified concepts of LLM applications in patient care is shown in Figure 2.…”

Section: Resultsmentioning

confidence: 99%

“…One study (n=1/89, 1.1%) was published in 2022 24 , 84 (n=84/89, 94.4%) in 2023 13, , and 4 (n=4/89, 4.5%) in 2024 [108][109][110][111] (all of which were peer-reviewed publications of preprints published in 2023). Most studies were quantitative non-randomized (n=84/89, 94.4%) 13,[25][26][27]103,104,106,107,[109][110][111] , 4 (n=4/89, 4.5%) 28,102,105,108 had a qualitative study design, and one (n=1/89, 1.1%) 24 was quantitative randomized according to the MMAT 2018 criteria. However, the LLM outputs were often first analyzed quantitatively but followed by a qualitative analysis of certain responses.…”

Section: Characteristics Of Included Studiesmentioning

confidence: 99%

See 4 more Smart Citations

Systematic Review of Large Language Models for Patient Care: Current Applications and Challenges

Busch,

Hoffmann,

Rueger

et al. 2024

Preprint

View full text Add to dashboard Cite

The introduction of large language models (LLMs) into clinical practice promises to improve patient education and empowerment, thereby personalizing medical care and broadening access to medical knowledge. Despite the popularity of LLMs, there is a significant gap in systematized information on their use in patient care. Therefore, this systematic review aims to synthesize current applications and limitations of LLMs in patient care using a data-driven convergent synthesis approach. We searched 5 databases for qualitative, quantitative, and mixed methods articles on LLMs in patient care published between 2022 and 2023. From 4,349 initial records, 89 studies across 29 medical specialties were included, primarily examining models based on the GPT-3.5 (53.2%, n=66 of 124 different LLMs examined per study) and GPT-4 (26.6%, n=33/124) architectures in medical question answering, followed by patient information generation, including medical text summarization or translation, and clinical documentation. Our analysis delineates two primary domains of LLM limitations: design and output. Design limitations included 6 second-order and 12 third-order codes, such as lack of medical domain optimization, data transparency, and accessibility issues, while output limitations included 9 second-order and 32 third-order codes, for example, non-reproducibility, non-comprehensiveness, incorrectness, unsafety, and bias. In conclusion, this study is the first review to systematically map LLM applications and limitations in patient care, providing a foundational framework and taxonomy for their implementation and evaluation in healthcare settings.

show abstract

Section: Resultsmentioning

confidence: 99%

Section: Resultsmentioning

confidence: 99%

Section: Resultsmentioning

confidence: 99%

Section: Resultsmentioning

confidence: 99%

Section: Characteristics Of Included Studiesmentioning

confidence: 99%

See 3 more Smart Citations

Systematic Review of Large Language Models for Patient Care: Current Applications and Challenges

Busch,

Hoffmann,

Rueger

et al. 2024

Preprint

View full text Add to dashboard Cite

show abstract

Artificial Intelligence and Patient Education: Examining the Accuracy and Reproducibility of Responses to Nutrition Questions Related to Inflammatory Bowel Disease by GPT-4

Cited by 1 publication

References 38 publications

Systematic Review of Large Language Models for Patient Care: Current Applications and Challenges

Systematic Review of Large Language Models for Patient Care: Current Applications and Challenges

Contact Info

Product

Resources

About