2024
DOI: 10.1097/iop.0000000000002567
|View full text |Cite
|
Sign up to set email alerts
|

Evaluating the Accuracy of ChatGPT and Google BARD in Fielding Oculoplastic Patient Queries: A Comparative Study on Artificial versus Human Intelligence

Eman M. Al-Sharif,
Rafaella C. Penteado,
Nahia Dib El Jalbout
et al.

Abstract: Purpose: This study evaluates and compares the accuracy of responses from 2 artificial intelligence platforms to patients’ oculoplastics-related questions. Methods: Questions directed toward oculoplastic surgeons were collected, rephrased, and input independently into ChatGPT-3.5 and BARD chatbots, using the prompt: “As an oculoplastic surgeon, how can I respond to my patient’s question?.” Responses were independently evaluated by 4 experienced oculopla… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

0
5
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
8

Relationship

0
8

Authors

Journals

citations
Cited by 13 publications
(5 citation statements)
references
References 22 publications
0
5
0
Order By: Relevance
“…However, there was no significant difference in the accuracy among the models. This was the opposite in Al-Sharif et al [28] and in Abi-Rafeh et al [2], where GPT-3.5 outperformed Bard in providing comprehensive, accurate responses.…”
Section: Discussionmentioning
confidence: 90%
See 2 more Smart Citations
“…However, there was no significant difference in the accuracy among the models. This was the opposite in Al-Sharif et al [28] and in Abi-Rafeh et al [2], where GPT-3.5 outperformed Bard in providing comprehensive, accurate responses.…”
Section: Discussionmentioning
confidence: 90%
“…Conversely, Gemini proved to be significantly superior in terms of readability, with responses at an average reading level of 10th grade and an average FRE score 10 points higher than that of the other models. Al-Sharif et al [ 28 ] identified similar results, with GPT having a higher analytical reading inventory (ARI) score than BARD, indicating that a higher level of education was required to understand its responses. Perhaps providing tailored prompts requesting a specific reading level might consistently improve the readability of the answers.…”
Section: Discussionmentioning
confidence: 91%
See 1 more Smart Citation
“…In a previous study, ChatGPT’s FKGL and FRE score indicated a hard reading level appropriate for only 33% of adults and those with a college education [ 36 ]. Furthermore, the texts produced by ChatGPT were harder than those from Bard, Gemini’s predecessor [ 37 ]. Our results show similar characteristics, as Gemini’s FKGL was significantly lower than ChatGPT-4’s, and although there was no significant difference in the FRE score, Gemini’s was higher than ChatGPT-4’s, indicating that Gemini’s responses were easier to read.…”
Section: Discussionmentioning
confidence: 99%
“…Despite considerable advances in technology, including natural language processing and emotional analysis, the capability of chatbots to fully comprehend and appropriately respond to the wide array of human emotions remains in question. This limitation is echoed in the works of Rahmanti et al (2022), Vannacci (2023, and Al-Sharif (2024), who critically examine the authenticity of empathetic responses generated by AI (Al-Sharif, 2024;Rahmanti et al, 2022;Vannacci, 2023), suggesting that achieving true emotional understanding may require further technological and conceptual advancements.…”
Section: Discussionmentioning
confidence: 99%