The diagnostic accuracy of differential diagnoses generated by artificial intelligence (AI) chatbots, including the generative pretrained transformer 3 (GPT-3) chatbot (ChatGPT-3) is unknown. This study evaluated the accuracy of differential-diagnosis lists generated by ChatGPT-3 for clinical vignettes with common chief complaints. General internal medicine physicians created clinical cases, correct diagnoses, and five differential diagnoses for ten common chief complaints. The rate of correct diagnosis by ChatGPT-3 within the ten differential-diagnosis lists was 28/30 (93.3%). The rate of correct diagnosis by physicians was still superior to that by ChatGPT-3 within the five differential-diagnosis lists (98.3% vs. 83.3%, p = 0.03). The rate of correct diagnosis by physicians was also superior to that by ChatGPT-3 in the top diagnosis (53.3% vs. 93.3%, p < 0.001). The rate of consistent differential diagnoses among physicians within the ten differential-diagnosis lists generated by ChatGPT-3 was 62/88 (70.5%). In summary, this study demonstrates the high diagnostic accuracy of differential-diagnosis lists generated by ChatGPT-3 for clinical cases with common chief complaints. This suggests that AI chatbots such as ChatGPT-3 can generate a well-differentiated diagnosis list for common chief complaints. However, the order of these lists can be improved in the future.
Intramural coronary transfer with an aortic sinus pouch. Central MessageWe applied an aortic sinus pouch technique for dextrotransposition of the great arteries with an intramural coronary artery.See Editorial Commentaries pages e131 and e133.
Background: The efficacy of artificial intelligence (AI)-driven automated medical-history-taking systems with AI-driven differential-diagnosis lists on physicians’ diagnostic accuracy was shown. However, considering the negative effects of AI-driven differential-diagnosis lists such as omission (physicians reject a correct diagnosis suggested by AI) and commission (physicians accept an incorrect diagnosis suggested by AI) errors, the efficacy of AI-driven automated medical-history-taking systems without AI-driven differential-diagnosis lists on physicians’ diagnostic accuracy should be evaluated. Objective: The present study was conducted to evaluate the efficacy of AI-driven automated medical-history-taking systems with or without AI-driven differential-diagnosis lists on physicians’ diagnostic accuracy. Methods: This randomized controlled study was conducted in January 2021 and included 22 physicians working at a university hospital. Participants were required to read 16 clinical vignettes in which the AI-driven medical history of real patients generated up to three differential diagnoses per case. Participants were divided into two groups: with and without an AI-driven differential-diagnosis list. Results: There was no significant difference in diagnostic accuracy between the two groups (57.4% vs. 56.3%, respectively; p = 0.91). Vignettes that included a correct diagnosis in the AI-generated list showed the greatest positive effect on physicians’ diagnostic accuracy (adjusted odds ratio 7.68; 95% CI 4.68–12.58; p < 0.001). In the group with AI-driven differential-diagnosis lists, 15.9% of diagnoses were omission errors and 14.8% were commission errors. Conclusions: Physicians’ diagnostic accuracy using AI-driven automated medical history did not differ between the groups with and without AI-driven differential-diagnosis lists.
A diagnostic decision support system (DDSS) is expected to reduce diagnostic errors. However, its effect on physicians’ diagnostic decisions remains unclear. Our study aimed to assess the prevalence of diagnoses from artificial intelligence (AI) in physicians’ differential diagnoses when using AI-driven DDSS that generates a differential diagnosis from the information entered by the patient before the clinical encounter on physicians’ differential diagnoses. In this randomized controlled study, an exploratory analysis was performed. Twenty-two physicians were required to generate up to three differential diagnoses per case by reading 16 clinical vignettes. The participants were divided into two groups, an intervention group, and a control group, with and without a differential diagnosis list of AI, respectively. The prevalence of physician diagnosis identical with the differential diagnosis of AI (primary outcome) was significantly higher in the intervention group than in the control group (70.2% vs. 55.1%, p < 0.001). The primary outcome was significantly >10% higher in the intervention group than in the control group, except for attending physicians, and physicians who did not trust AI. This study suggests that at least 15% of physicians’ differential diagnoses were affected by the differential diagnosis list in the AI-driven DDSS.
Background Automated medical history–taking systems that generate differential diagnosis lists have been suggested to contribute to improved diagnostic accuracy. However, the effect of these systems on diagnostic errors in clinical practice remains unknown. Objective This study aimed to assess the incidence of diagnostic errors in an outpatient department, where an artificial intelligence (AI)–driven automated medical history–taking system that generates differential diagnosis lists was implemented in clinical practice. Methods We conducted a retrospective observational study using data from a community hospital in Japan. We included patients aged 20 years and older who used an AI-driven, automated medical history–taking system that generates differential diagnosis lists in the outpatient department of internal medicine for whom the index visit was between July 1, 2019, and June 30, 2020, followed by unplanned hospitalization within 14 days. The primary endpoint was the incidence of diagnostic errors, which were detected using the Revised Safer Dx Instrument by at least two independent reviewers. To evaluate the effect of differential diagnosis lists from the AI system on the incidence of diagnostic errors, we compared the incidence of these errors between a group where the AI system generated the final diagnosis in the differential diagnosis list and a group where the AI system did not generate the final diagnosis in the list; the Fisher exact test was used for comparison between these groups. For cases with confirmed diagnostic errors, further review was conducted to identify the contributing factors of these errors via discussion among three reviewers, using the Safer Dx Process Breakdown Supplement as a reference. Results A total of 146 patients were analyzed. A final diagnosis was confirmed for 138 patients and was observed in the differential diagnosis list from the AI system for 69 patients. Diagnostic errors occurred in 16 out of 146 patients (11.0%, 95% CI 6.4%-17.2%). Although statistically insignificant, the incidence of diagnostic errors was lower in cases where the final diagnosis was included in the differential diagnosis list from the AI system than in cases where the final diagnosis was not included in the list (7.2% vs 15.9%, P=.18). Conclusions The incidence of diagnostic errors among patients in the outpatient department of internal medicine who used an automated medical history–taking system that generates differential diagnosis lists seemed to be lower than the previously reported incidence of diagnostic errors. This result suggests that the implementation of an automated medical history–taking system that generates differential diagnosis lists could be beneficial for diagnostic safety in the outpatient department of internal medicine.
BACKGROUND Automated medical history-taking systems that generate differential diagnosis lists have been suggested to contribute to improved diagnostic accuracy. However, the effect of this system on diagnostic errors in clinical practice remains unknown. OBJECTIVE This study aimed to assess the incidence of diagnostic errors in an outpatient department, where an artificial intelligence (AI)-driven automated medical history-taking system that generates differential diagnosis lists was implemented in clinical practice. METHODS We conducted a retrospective observational study using data from a community hospital in Japan. We included patients aged 20 and older who used an AI-driven automated medical history-taking system that generates differential diagnosis lists in the outpatient department of internal medicine for whom the index visit was between July 1, 2019, and June 30, 2020, followed by unplanned hospitalization within 14 days. The primary endpoint was the incidence of diagnostic errors, which were detected using the Revised Safer Dx instrument by at least two independent reviewers. To evaluate the differential diagnosis list of AI on the incidence of diagnostic errors, we compared the incidence of diagnostic errors between the groups in which AI generated the final diagnosis in the differential diagnosis list and in which AI did not generate the final diagnosis in the differential diagnosis list; Fisher’s exact test was used for comparison between these groups. For cases with confirmed diagnostic errors, further review was conducted to identify the contributing factors of diagnostic errors via discussion among the three reviewers, using the Safer Dx Process Breakdown Supplement as a reference. RESULTS A total of 146 patients were analyzed. The final diagnosis was confirmed in 138 patients and the final diagnosis was observed in the differential diagnosis list of the AI in 69 patients. Diagnostic errors occurred in 16 of 146 patients (11.0%; 95% confidence interval, 6.4-17.2%). Although statistically insignificant, the incidence of diagnostic errors was lower in cases in which the final diagnosis was included in the differential diagnosis list of AI than in cases in which the final diagnosis was not included (7.2% vs. 15.9%, P=.18). Regarding the quality of clinical history taken by AI, the final diagnosis was easily assumed by reading only the clinical history taken by the system in 11 of 16 cases (68.8%). CONCLUSIONS The incidence of diagnostic errors in the internal medicine outpatients used an automated medical history-taking system that generates differential diagnosis lists seemed to be lower than the previously reported incidence of diagnostic errors. This result suggests that the implementation of an automated medical history-taking system that generates differential diagnosis lists could be beneficial for diagnostic safety in the outpatient department of internal medicine.
This study aimed to investigate consultation outcomes from gastroenterologists to generalist physicians for the diagnostic workup of undiagnosed chronic abdominal pain. This was a single-center, retrospective, descriptive study. Patients were included who were ≥15 years old and consulted from the Department of Gastroenterology to the Department of Diagnostic Medicine, to establish a diagnosis for chronic abdominal pain, at the Dokkyo University Hospital from 1 April 2016 to 31 August 2020. We retrospectively reviewed the patients’ medical charts and extracted data. A total of 12 cases were included. Eight patients (66.7%) were diagnosed with and treated for functional gastrointestinal disorders (FGID) at the Department of Gastroenterology; their lack of improvement under treatment for FGID was the reason for their referral to the Department of Diagnostic Medicine for further examination. After this consultation, new possible diagnoses were generated for eight patients (66.7%). Six of the eight patients (75.0%) were diagnosed with abdominal wall pain (anterior cutaneous nerve entrapment syndrome, n = 3; myofascial pain, n = 1; falciform pain, n = 1; and herpes zoster non-herpeticus; n = 1). Consultation referral from gastroenterologists to generalists could generate new possible diagnoses in approximately 70% of patients with undiagnosed chronic abdominal pain.
Background Low diagnostic accuracy is a major concern in automated medical history–taking systems with differential diagnosis (DDx) generators. Extending the concept of collective intelligence to the field of DDx generators such that the accuracy of judgment becomes higher when accepting an integrated diagnosis list from multiple people than when accepting a diagnosis list from a single person may be a possible solution. Objective The purpose of this study is to assess whether the combined use of several DDx generators improves the diagnostic accuracy of DDx lists. Methods We used medical history data and the top 10 DDx lists (index DDx lists) generated by an artificial intelligence (AI)–driven automated medical history–taking system from 103 patients with confirmed diagnoses. Two research physicians independently created the other top 10 DDx lists (second and third DDx lists) per case by imputing key information into the other 2 DDx generators based on the medical history generated by the automated medical history–taking system without reading the index lists generated by the automated medical history–taking system. We used the McNemar test to assess the improvement in diagnostic accuracy from the index DDx lists to the three types of combined DDx lists: (1) simply combining DDx lists from the index, second, and third lists; (2) creating a new top 10 DDx list using a 1/n weighting rule; and (3) creating new lists with only shared diagnoses among DDx lists from the index, second, and third lists. We treated the data generated by 2 research physicians from the same patient as independent cases. Therefore, the number of cases included in analyses in the case using 2 additional lists was 206 (103 cases × 2 physicians’ input). Results The diagnostic accuracy of the index lists was 46% (47/103). Diagnostic accuracy was improved by simply combining the other 2 DDx lists (133/206, 65%, P<.001), whereas the other 2 combined DDx lists did not improve the diagnostic accuracy of the DDx lists (106/206, 52%, P=.05 in the collective list with the 1/n weighting rule and 29/206, 14%, P<.001 in the only shared diagnoses among the 3 DDx lists). Conclusions Simply adding each of the top 10 DDx lists from additional DDx generators increased the diagnostic accuracy of the DDx list by approximately 20%, suggesting that the combinational use of DDx generators early in the diagnostic process is beneficial.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.