Background Large language models exhibiting human-level performance in specialized tasks are emerging; examples include Generative Pretrained Transformer 3.5, which underlies the processing of ChatGPT. Rigorous trials are required to understand the capabilities of emerging technology, so that innovation can be directed to benefit patients and practitioners. Objective Here, we evaluated the strengths and weaknesses of ChatGPT in primary care using the Membership of the Royal College of General Practitioners Applied Knowledge Test (AKT) as a medium. Methods AKT questions were sourced from a web-based question bank and 2 AKT practice papers. In total, 674 unique AKT questions were inputted to ChatGPT, with the model’s answers recorded and compared to correct answers provided by the Royal College of General Practitioners. Each question was inputted twice in separate ChatGPT sessions, with answers on repeated trials compared to gauge consistency. Subject difficulty was gauged by referring to examiners’ reports from 2018 to 2022. Novel explanations from ChatGPT—defined as information provided that was not inputted within the question or multiple answer choices—were recorded. Performance was analyzed with respect to subject, difficulty, question source, and novel model outputs to explore ChatGPT’s strengths and weaknesses. Results Average overall performance of ChatGPT was 60.17%, which is below the mean passing mark in the last 2 years (70.42%). Accuracy differed between sources (P=.04 and .06). ChatGPT’s performance varied with subject category (P=.02 and .02), but variation did not correlate with difficulty (Spearman ρ=–0.241 and –0.238; P=.19 and .20). The proclivity of ChatGPT to provide novel explanations did not affect accuracy (P>.99 and .23). Conclusions Large language models are approaching human expert–level performance, although further development is required to match the performance of qualified primary care physicians in the AKT. Validated high-performance models may serve as assistants or autonomous clinical tools to ameliorate the general practice workforce crisis.
Background/Objectives Ophthalmic disorders cause 8% of hospital clinic attendances, the highest of any specialty. The fundamental need for a distance visual acuity (VA) measurement constrains remote consultation. A web-application, DigiVis, facilitates self-assessment of VA using two internet-connected devices. This prospective validation study aimed to establish its accuracy, reliability, usability and acceptability. Subjects/Methods In total, 120 patients aged 5–87 years (median = 27) self-tested their vision twice using DigiVis in addition to their standard clinical assessment. Eyes with VA worse than +0.80 logMAR were excluded. Accuracy and test-retest (TRT) variability were compared using Bland–Altman analysis and intraclass correlation coefficients (ICC). Patient feedback was analysed. Results Bias between VA tests was insignificant at −0.001 (95% CI −0.017 to 0.015) logMAR. The upper limit of agreement (LOA) was 0.173 (95% CI 0.146 to 0.201) and the lower LOA −0.175 (95% CI −0.202 to −0.147) logMAR. The ICC was 0.818 (95% CI 0.748 to 0.869). DigiVis TRT mean bias was similarly insignificant, at 0.001 (95% CI −0.011 to 0.013) logMAR, the upper LOA was 0.124 (95% CI 0.103 to 0.144) and the lower LOA −0.121 (95% CI −0.142 to −0.101) logMAR. The ICC was 0.922 (95% CI 0.887 to 0.946). 95% of subjects were willing to use DigiVis to monitor vision at home. Conclusions Self-tested distance VA using DigiVis is accurate, reliable and well accepted by patients. The app has potential to facilitate home monitoring, triage and remote consultation but widescale implementation will require integration with NHS databases and secure patient data storage.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.