Evaluating the Accuracy of ChatGPT and Google BARD in Fielding Oculoplastic Patient Queries: A Comparative Study on Artificial versus Human Intelligence

Al-Sharif, Eman M.; Penteado, Rafaella C.; Dib El Jalbout, Nahia; Topilow, Nicole J.; Shoji, Marissa K.; Kikkawa, Don O.; Liu, Catherine Y.; Korn, Bobby S.

doi:10.1097/iop.0000000000002567

Cited by 13 publications

(5 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, there was no significant difference in the accuracy among the models. This was the opposite in Al-Sharif et al [28] and in Abi-Rafeh et al [2], where GPT-3.5 outperformed Bard in providing comprehensive, accurate responses.…”

Section: Discussionmentioning

confidence: 90%

“…Conversely, Gemini proved to be significantly superior in terms of readability, with responses at an average reading level of 10th grade and an average FRE score 10 points higher than that of the other models. Al-Sharif et al [ 28 ] identified similar results, with GPT having a higher analytical reading inventory (ARI) score than BARD, indicating that a higher level of education was required to understand its responses. Perhaps providing tailored prompts requesting a specific reading level might consistently improve the readability of the answers.…”

Section: Discussionmentioning

confidence: 91%

“…In plastic surgery, several studies have analyzed LLMs’ capabilities for answering patient questions in the pre- and postoperative period for breast surgery [ 2 , 3 , 8 ], blepharoplasty [ 14 ], rhinoplasty [ 15 ], and oculoplastic surgery [ 28 ]. This underscores their potential as extremely helpful and valuable adjunctive tools for patient management and the importance of exploring their capabilities, with further visualization toward independent tools ( Figure 5 ).…”

Section: Discussionmentioning

confidence: 99%

See 2 more Smart Citations

Artificial Intelligence in Postoperative Care: Assessing Large Language Models for Patient Recommendations in Plastic Surgery

Gomez-Cabello,

Borna,

Pressman

et al. 2024

Healthcare

View full text Add to dashboard Cite

Since their release, the medical community has been actively exploring large language models’ (LLMs) capabilities, which show promise in providing accurate medical knowledge. One potential application is as a patient resource. This study analyzes and compares the ability of the currently available LLMs, ChatGPT-3.5, GPT-4, and Gemini, to provide postoperative care recommendations to plastic surgery patients. We presented each model with 32 questions addressing common patient concerns after surgical cosmetic procedures and evaluated the medical accuracy, readability, understandability, and actionability of the models’ responses. The three LLMs provided equally accurate information, with GPT-3.5 averaging the highest on the Likert scale (LS) (4.18 ± 0.93) (p = 0.849), while Gemini provided significantly more readable (p = 0.001) and understandable responses (p = 0.014; p = 0.001). There was no difference in the actionability of the models’ responses (p = 0.830). Although LLMs have shown their potential as adjunctive tools in postoperative patient care, further refinement and research are imperative to enable their evolution into comprehensive standalone resources.

show abstract

Section: Discussionmentioning

confidence: 90%

Section: Discussionmentioning

confidence: 91%

Section: Discussionmentioning

confidence: 99%

See 1 more Smart Citation

Artificial Intelligence in Postoperative Care: Assessing Large Language Models for Patient Recommendations in Plastic Surgery

Gomez-Cabello,

Borna,

Pressman

et al. 2024

Healthcare

View full text Add to dashboard Cite

show abstract

“…In a previous study, ChatGPT’s FKGL and FRE score indicated a hard reading level appropriate for only 33% of adults and those with a college education [ 36 ]. Furthermore, the texts produced by ChatGPT were harder than those from Bard, Gemini’s predecessor [ 37 ]. Our results show similar characteristics, as Gemini’s FKGL was significantly lower than ChatGPT-4’s, and although there was no significant difference in the FRE score, Gemini’s was higher than ChatGPT-4’s, indicating that Gemini’s responses were easier to read.…”

Section: Discussionmentioning

confidence: 99%

Large Language Models for Intraoperative Decision Support in Plastic Surgery: A Comparison between ChatGPT-4 and Gemini

Gomez-Cabello,

Borna,

Pressman

et al. 2024

Medicina

View full text Add to dashboard Cite

Background and Objectives: Large language models (LLMs) are emerging as valuable tools in plastic surgery, potentially reducing surgeons’ cognitive loads and improving patients’ outcomes. This study aimed to assess and compare the current state of the two most common and readily available LLMs, Open AI’s ChatGPT-4 and Google’s Gemini Pro (1.0 Pro), in providing intraoperative decision support in plastic and reconstructive surgery procedures. Materials and Methods: We presented each LLM with 32 independent intraoperative scenarios spanning 5 procedures. We utilized a 5-point and a 3-point Likert scale for medical accuracy and relevance, respectively. We determined the readability of the responses using the Flesch–Kincaid Grade Level (FKGL) and Flesch Reading Ease (FRE) score. Additionally, we measured the models’ response time. We compared the performance using the Mann–Whitney U test and Student’s t-test. Results: ChatGPT-4 significantly outperformed Gemini in providing accurate (3.59 ± 0.84 vs. 3.13 ± 0.83, p-value = 0.022) and relevant (2.28 ± 0.77 vs. 1.88 ± 0.83, p-value = 0.032) responses. Alternatively, Gemini provided more concise and readable responses, with an average FKGL (12.80 ± 1.56) significantly lower than ChatGPT-4′s (15.00 ± 1.89) (p < 0.0001). However, there was no difference in the FRE scores (p = 0.174). Moreover, Gemini’s average response time was significantly faster (8.15 ± 1.42 s) than ChatGPT’-4′s (13.70 ± 2.87 s) (p < 0.0001). Conclusions: Although ChatGPT-4 provided more accurate and relevant responses, both models demonstrated potential as intraoperative tools. Nevertheless, their performance inconsistency across the different procedures underscores the need for further training and optimization to ensure their reliability as intraoperative decision-support tools.

show abstract

“…Despite considerable advances in technology, including natural language processing and emotional analysis, the capability of chatbots to fully comprehend and appropriately respond to the wide array of human emotions remains in question. This limitation is echoed in the works of Rahmanti et al (2022), Vannacci (2023, and Al-Sharif (2024), who critically examine the authenticity of empathetic responses generated by AI (Al-Sharif, 2024;Rahmanti et al, 2022;Vannacci, 2023), suggesting that achieving true emotional understanding may require further technological and conceptual advancements.…”

Section: Discussionmentioning

confidence: 99%

Artificial Empathy: User Experiences with Emotionally Intelligent Chatbots

Rostami,

Navabinejad

2023

aitechbesosci

View full text Add to dashboard Cite

This study aims to explore user experiences with emotionally intelligent chatbots, focusing on their perceived empathy, satisfaction with interactions, and the impact on user perception. Additionally, it seeks to identify the main challenges and future expectations users have towards these AI systems. Employing a qualitative research design, this study collected data through semi-structured interviews with 28 participants who had interacted with emotionally intelligent chatbots in various contexts. Thematic analysis was conducted to identify main themes, categories, and concepts within the data, providing insights into users' perceptions and experiences. The analysis revealed eight main themes: Perceived Empathy, Interaction Satisfaction, Trust and Security, Human-Like Interaction, User Adaptation, Impact on User Perception, Barriers to Engagement, and Future Expectations. These themes encompass categories such as Emotional Understanding, Contextual Sensitivity, Engagement Level, Privacy Concerns, Learning Curve, AI Capabilities, Technological Limitations, and Improvement Suggestions. Participants valued chatbots' ability to understand and adapt to their emotional states but highlighted challenges in achieving authentic empathy and expressed concerns over privacy and data security. Emotionally intelligent chatbots hold promise for enhancing user experiences through artificial empathy. However, the authenticity of empathy, coupled with ethical considerations such as privacy and security, presents significant challenges. Future developments should focus on improving the genuineness of empathetic responses, ensuring ethical use of AI, and addressing users' concerns to fully realize the potential of emotionally intelligent chatbots.

show abstract

Evaluating the Accuracy of ChatGPT and Google BARD in Fielding Oculoplastic Patient Queries: A Comparative Study on Artificial versus Human Intelligence

Cited by 13 publications

References 22 publications

Artificial Intelligence in Postoperative Care: Assessing Large Language Models for Patient Recommendations in Plastic Surgery

Artificial Intelligence in Postoperative Care: Assessing Large Language Models for Patient Recommendations in Plastic Surgery

Large Language Models for Intraoperative Decision Support in Plastic Surgery: A Comparison between ChatGPT-4 and Gemini

Artificial Empathy: User Experiences with Emotionally Intelligent Chatbots

Contact Info

Product

Resources

About