Background Automated medical history–taking systems that generate differential diagnosis lists have been suggested to contribute to improved diagnostic accuracy. However, the effect of these systems on diagnostic errors in clinical practice remains unknown. Objective This study aimed to assess the incidence of diagnostic errors in an outpatient department, where an artificial intelligence (AI)–driven automated medical history–taking system that generates differential diagnosis lists was implemented in clinical practice. Methods We conducted a retrospective observational study using data from a community hospital in Japan. We included patients aged 20 years and older who used an AI-driven, automated medical history–taking system that generates differential diagnosis lists in the outpatient department of internal medicine for whom the index visit was between July 1, 2019, and June 30, 2020, followed by unplanned hospitalization within 14 days. The primary endpoint was the incidence of diagnostic errors, which were detected using the Revised Safer Dx Instrument by at least two independent reviewers. To evaluate the effect of differential diagnosis lists from the AI system on the incidence of diagnostic errors, we compared the incidence of these errors between a group where the AI system generated the final diagnosis in the differential diagnosis list and a group where the AI system did not generate the final diagnosis in the list; the Fisher exact test was used for comparison between these groups. For cases with confirmed diagnostic errors, further review was conducted to identify the contributing factors of these errors via discussion among three reviewers, using the Safer Dx Process Breakdown Supplement as a reference. Results A total of 146 patients were analyzed. A final diagnosis was confirmed for 138 patients and was observed in the differential diagnosis list from the AI system for 69 patients. Diagnostic errors occurred in 16 out of 146 patients (11.0%, 95% CI 6.4%-17.2%). Although statistically insignificant, the incidence of diagnostic errors was lower in cases where the final diagnosis was included in the differential diagnosis list from the AI system than in cases where the final diagnosis was not included in the list (7.2% vs 15.9%, P=.18). Conclusions The incidence of diagnostic errors among patients in the outpatient department of internal medicine who used an automated medical history–taking system that generates differential diagnosis lists seemed to be lower than the previously reported incidence of diagnostic errors. This result suggests that the implementation of an automated medical history–taking system that generates differential diagnosis lists could be beneficial for diagnostic safety in the outpatient department of internal medicine.
BACKGROUND Automated medical history-taking systems that generate differential diagnosis lists have been suggested to contribute to improved diagnostic accuracy. However, the effect of this system on diagnostic errors in clinical practice remains unknown. OBJECTIVE This study aimed to assess the incidence of diagnostic errors in an outpatient department, where an artificial intelligence (AI)-driven automated medical history-taking system that generates differential diagnosis lists was implemented in clinical practice. METHODS We conducted a retrospective observational study using data from a community hospital in Japan. We included patients aged 20 and older who used an AI-driven automated medical history-taking system that generates differential diagnosis lists in the outpatient department of internal medicine for whom the index visit was between July 1, 2019, and June 30, 2020, followed by unplanned hospitalization within 14 days. The primary endpoint was the incidence of diagnostic errors, which were detected using the Revised Safer Dx instrument by at least two independent reviewers. To evaluate the differential diagnosis list of AI on the incidence of diagnostic errors, we compared the incidence of diagnostic errors between the groups in which AI generated the final diagnosis in the differential diagnosis list and in which AI did not generate the final diagnosis in the differential diagnosis list; Fisher’s exact test was used for comparison between these groups. For cases with confirmed diagnostic errors, further review was conducted to identify the contributing factors of diagnostic errors via discussion among the three reviewers, using the Safer Dx Process Breakdown Supplement as a reference. RESULTS A total of 146 patients were analyzed. The final diagnosis was confirmed in 138 patients and the final diagnosis was observed in the differential diagnosis list of the AI in 69 patients. Diagnostic errors occurred in 16 of 146 patients (11.0%; 95% confidence interval, 6.4-17.2%). Although statistically insignificant, the incidence of diagnostic errors was lower in cases in which the final diagnosis was included in the differential diagnosis list of AI than in cases in which the final diagnosis was not included (7.2% vs. 15.9%, P=.18). Regarding the quality of clinical history taken by AI, the final diagnosis was easily assumed by reading only the clinical history taken by the system in 11 of 16 cases (68.8%). CONCLUSIONS The incidence of diagnostic errors in the internal medicine outpatients used an automated medical history-taking system that generates differential diagnosis lists seemed to be lower than the previously reported incidence of diagnostic errors. This result suggests that the implementation of an automated medical history-taking system that generates differential diagnosis lists could be beneficial for diagnostic safety in the outpatient department of internal medicine.
BACKGROUND Low diagnostic accuracy is a major concern in automated medical history-taking systems with differential diagnosis generators. Extending the concept of collective intelligence to the field of differential diagnosis generators such that the accuracy of judgment becomes higher when accepting an integrated diagnosis list from multiple persons than when accepting a diagnosis list from a single person may be a possible solution. OBJECTIVE To assess whether the combined use of several differential diagnosis (DDx) generators improves the diagnostic accuracy of DDx lists. METHODS We used medical history data and the top 10 DDx lists (index DDx lists) generated by an artificial intelligence (AI)-driven automated medical history-taking system from 103 patients with confirmed diagnoses. Two research physicians independently created other top 10 DDx lists (second and third DDx lists) per case by imputing key information into the other two DDx generators based on the medical history generated by the automated medical history-taking system, without reading the index lists generated by the automated medical history-taking system. We used the McNemar test to assess the improvement in diagnostic accuracy from the index DDx lists to the three types of combined DDx lists: (a) simply combining DDx from the index, second, and third lists; (b) creating a new top 10 DDx list using a 1/n weighting rule; and (c) creating new lists with only shared diagnoses among DDx lists from the index, second, and third lists. We treated the data generated by two research physicians from the same patient as independent cases. Therefore, the number of cases included in analyses in the case using two additional lists was 206 (103 cases × two physicians input). RESULTS The diagnostic accuracy of the index lists was 47/103 (45.6%). Diagnostic accuracy was improved by simply combining the other two DDx lists (133/206, 64.6%, P<.001), whereas the other two combined DDx lists did not improve the diagnostic accuracy of the DDx lists (106/206, 51.5%, P=.052 in the collective list with 1/n weighting rule; and 29/206, 14.1%, P<.001 in the only shared diagnoses among the three DDx lists). CONCLUSIONS Simply adding each of the top 10 DDx from additional DDx generators increased the diagnostic accuracy of the DDx list by approximately 20%, suggesting that the combinational use of DDx generators early in the diagnostic process is beneficial.
Background Low diagnostic accuracy is a major concern in automated medical history–taking systems with differential diagnosis (DDx) generators. Extending the concept of collective intelligence to the field of DDx generators such that the accuracy of judgment becomes higher when accepting an integrated diagnosis list from multiple people than when accepting a diagnosis list from a single person may be a possible solution. Objective The purpose of this study is to assess whether the combined use of several DDx generators improves the diagnostic accuracy of DDx lists. Methods We used medical history data and the top 10 DDx lists (index DDx lists) generated by an artificial intelligence (AI)–driven automated medical history–taking system from 103 patients with confirmed diagnoses. Two research physicians independently created the other top 10 DDx lists (second and third DDx lists) per case by imputing key information into the other 2 DDx generators based on the medical history generated by the automated medical history–taking system without reading the index lists generated by the automated medical history–taking system. We used the McNemar test to assess the improvement in diagnostic accuracy from the index DDx lists to the three types of combined DDx lists: (1) simply combining DDx lists from the index, second, and third lists; (2) creating a new top 10 DDx list using a 1/n weighting rule; and (3) creating new lists with only shared diagnoses among DDx lists from the index, second, and third lists. We treated the data generated by 2 research physicians from the same patient as independent cases. Therefore, the number of cases included in analyses in the case using 2 additional lists was 206 (103 cases × 2 physicians’ input). Results The diagnostic accuracy of the index lists was 46% (47/103). Diagnostic accuracy was improved by simply combining the other 2 DDx lists (133/206, 65%, P<.001), whereas the other 2 combined DDx lists did not improve the diagnostic accuracy of the DDx lists (106/206, 52%, P=.05 in the collective list with the 1/n weighting rule and 29/206, 14%, P<.001 in the only shared diagnoses among the 3 DDx lists). Conclusions Simply adding each of the top 10 DDx lists from additional DDx generators increased the diagnostic accuracy of the DDx list by approximately 20%, suggesting that the combinational use of DDx generators early in the diagnostic process is beneficial.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.