Assessment of a Large Language Model’s Responses to Questions and Cases About Glaucoma and Retina Management

Huang, Andy S.; Hirabayashi, Kyle; Barna, Laura; Parikh, Deep; Pasquale, Louis R.

doi:10.1001/jamaophthalmol.2023.6917

Cited by 16 publications

(2 citation statements)

References 13 publications

(34 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…One possible explanation is that uniqueness neglect—a concern that algorithm providers are less able than human providers to account for residents’ (or patients’) unique characteristics and circumstances—drives consumer resistance to digital medical technology [ 23 ]. Therefore, personalized health management solutions based on large language models should be developed urgently [ 24 ] to meet the residents’ individual demands. In addition, a survey of population preferences for medical AI indicated that the most important factor for the public is that physicians are ultimately responsible for diagnosis and treatment planning [ 25 ].…”

Section: Discussionmentioning

confidence: 99%

Service Quality and Residents’ Preferences for Facilitated Self-Service Fundus Disease Screening: Cross-Sectional Study

Lin,

Ma,

Jiang

et al. 2024

J Med Internet Res

View full text Add to dashboard Cite

Background Fundus photography is the most important examination in eye disease screening. A facilitated self-service eye screening pattern based on the fully automatic fundus camera was developed in 2022 in Shanghai, China; it may help solve the problem of insufficient human resources in primary health care institutions. However, the service quality and residents’ preference for this new pattern are unclear. Objective This study aimed to compare the service quality and residents’ preferences between facilitated self-service eye screening and traditional manual screening and to explore the relationships between the screening service’s quality and residents’ preferences. Methods We conducted a cross-sectional study in Shanghai, China. Residents who underwent facilitated self-service fundus disease screening at one of the screening sites were assigned to the exposure group; those who were screened with a traditional fundus camera operated by an optometrist at an adjacent site comprised the control group. The primary outcome was the screening service quality, including effectiveness (image quality and screening efficiency), physiological discomfort, safety, convenience, and trustworthiness. The secondary outcome was the participants’ preferences. Differences in service quality and the participants’ preferences between the 2 groups were compared using chi-square tests separately. Subgroup analyses for exploring the relationships between the screening service’s quality and residents’ preference were conducted using generalized logit models. Results A total of 358 residents enrolled; among them, 176 (49.16%) were included in the exposure group and the remaining 182 (50.84%) in the control group. Residents’ basic characteristics were balanced between the 2 groups. There was no significant difference in service quality between the 2 groups (image quality pass rate: P=.79; average screening time: P=.57; no physiological discomfort rate: P=.92; safety rate: P=.78; convenience rate: P=.95; trustworthiness rate: P=.20). However, the proportion of participants who were willing to use the same technology for their next screening was significantly lower in the exposure group than in the control group (P<.001). Subgroup analyses suggest that distrust in the facilitated self-service eye screening might increase the probability of refusal to undergo screening (P=.02). Conclusions This study confirms that the facilitated self-service fundus disease screening pattern could achieve good service quality. However, it was difficult to reverse residents’ preferences for manual screening in a short period, especially when the original manual service was already excellent. Therefore, the digital transformation of health care must be cautious. We suggest that attention be paid to the residents’ individual needs. More efficient man-machine collaboration and personalized health management solutions based on large language models are both needed.

show abstract

Section: Discussionmentioning

confidence: 99%

Service Quality and Residents’ Preferences for Facilitated Self-Service Fundus Disease Screening: Cross-Sectional Study

Lin,

Ma,

Jiang

et al. 2024

J Med Internet Res

View full text Add to dashboard Cite

show abstract

“…For example, ChatGPT-3.5 had similar or better accuracy than senior ophthalmology residents in diagnosing primary and secondary glaucoma cases retrieved from a public online database [ 7 ]. Similarly, ChatGPT-4 outperformed glaucoma specialists and was comparable with retina specialists in diagnostic and treatment accuracy of glaucoma and retina cases [ 8 ]. By contrast, ChatGPT exhibited reasonable but inferior diagnostic accuracy than human experts in cornea [ 9 ], uveitis [ 10 , 11 ], and neuro-ophthalmology [ 12 ] cases.…”

Section: Discussionmentioning

confidence: 99%

Artificial Versus Human Intelligence in the Diagnostic Approach of Ophthalmic Case Scenarios: A Qualitative Evaluation of Performance and Consistency

Mandalos,

Tsouris

2024

Cureus

View full text Add to dashboard Cite

Purpose: To evaluate the efficiency of three artificial intelligence (AI) chatbots (ChatGPT-3.5 (OpenAI, San Francisco, California, United States), Bing Copilot (Microsoft Corporation, Redmond, Washington, United States), Google Gemini (Google LLC, Mountain View, California, United States)) in assisting the ophthalmologist in the diagnostic approach and management of challenging ophthalmic cases and compare their performance with that of a practicing human ophthalmic specialist. The secondary aim was to assess the short- and medium-term consistency of ChatGPT’s responses. Methods: Eleven ophthalmic case scenarios of variable complexity were presented to the AI chatbots and to an ophthalmic specialist in a stepwise fashion. Advice regarding the initial differential diagnosis, the final diagnosis, further investigation, and management was asked for. One month later, the same process was repeated twice on the same day for ChatGPT only. Results: The individual diagnostic performance of all three AI chatbots was inferior to that of the ophthalmic specialist; however, they provided useful complementary input in the diagnostic algorithm. This was especially true for ChatGPT and Bing Copilot. ChatGPT exhibited reasonable short- and medium-term consistency, with the mean Jaccard similarity coefficient of responses varying between 0.58 and 0.76. Conclusion: AI chatbots may act as useful assisting tools in the diagnosis and management of challenging ophthalmic cases; however, their responses should be scrutinized for potential inaccuracies, and by no means can they replace consultation with an ophthalmic specialist.

show abstract

Large Language Models and the Shoreline of Ophthalmology

Young,

Zhao

2024

JAMA Ophthalmol

View full text Add to dashboard Cite

show abstract

Assessment of a Large Language Model’s Responses to Questions and Cases About Glaucoma and Retina Management

Cited by 16 publications

References 13 publications

Service Quality and Residents’ Preferences for Facilitated Self-Service Fundus Disease Screening: Cross-Sectional Study

Service Quality and Residents’ Preferences for Facilitated Self-Service Fundus Disease Screening: Cross-Sectional Study

Artificial Versus Human Intelligence in the Diagnostic Approach of Ophthalmic Case Scenarios: A Qualitative Evaluation of Performance and Consistency

Large Language Models and the Shoreline of Ophthalmology

Contact Info

Product

Resources

About