ObjectivesEstablishing confidence in the safety of Artificial Intelligence (AI)-based clinical decision support systems is important prior to clinical deployment and regulatory approval for systems with increasing autonomy. Here, we undertook safety assurance of the AI Clinician, a previously published reinforcement learning-based treatment recommendation system for sepsis.MethodsAs part of the safety assurance, we defined four clinical hazards in sepsis resuscitation based on clinical expert opinion and the existing literature. We then identified a set of unsafe scenarios, intended to limit the action space of the AI agent with the goal of reducing the likelihood of hazardous decisions.ResultsUsing a subset of the Medical Information Mart for Intensive Care (MIMIC-III) database, we demonstrated that our previously published ‘AI clinician’ recommended fewer hazardous decisions than human clinicians in three out of our four predefined clinical scenarios, while the difference was not statistically significant in the fourth scenario. Then, we modified the reward function to satisfy our safety constraints and trained a new AI Clinician agent. The retrained model shows enhanced safety, without negatively impacting model performance.DiscussionWhile some contextual patient information absent from the data may have pushed human clinicians to take hazardous actions, the data were curated to limit the impact of this confounder.ConclusionThese advances provide a use case for the systematic safety assurance of AI-based clinical systems towards the generation of explicit safety evidence, which could be replicated for other AI applications or other clinical contexts, and inform medical device regulatory bodies.
Reinforcement Learning (RL) is emerging as tool for tackling complex control and decision-making problems. However, in high-risk environments such as healthcare, manufacturing, automotive or aerospace, it is often challenging to bridge the gap between an apparently optimal policy learned by an agent and its real-world deployment, due to the uncertainties and risk associated with it. Broadly speaking RL agents face two kinds of uncertainty, 1. aleatoric uncertainty, which reflects randomness or noise in the dynamics of the world, and 2. epistemic uncertainty, which reflects the bounded knowledge of the agent due to model limitations and finite amount of information/data the agent has acquired about the world. These two types of uncertainty carry fundamentally different implications for the evaluation of performance and the level of risk or trust. Yet these aleatoric and epistemic uncertainties are generally confounded as standard and even distributional RL is agnostic to this difference. Here we propose how a distributional approach (UA-DQN) can be recast to render uncertainties by decomposing the net effects of each uncertainty . We demonstrate the operation of this method in grid world examples to build intuition and then show a proof of concept application for an RL agent operating as a clinical decision support system in critical care.
Background: The challenge of responsibly guiding clinicians to incorporate AI recommendations and explanations into their day-to-day practice has thus far neglected the realm of decisions outside of diagnosis (where there is no gold standard to compare against). We assess how clinicians' decisions may be influenced by additional information more broadly, and how this influence can be modified by either the source of the information (human peers or AI) and the presence or absence of an AI explanation (XAI, here using simple feature importance). Methods: We conducted a human-AI interaction study with ICU doctors using a modified between-subjects design. Doctors were presented on a computer for each of 16 trials with a patient case and prompted to prescribe continuous values for IV fluid and vasopressor. We used a multi-factorial experimental design with four arms, where each clinician experienced all four arms on different subsets of our 24 patients. The four arms were (i) baseline (control), (ii) peer human clinician scenario showing what doses had been prescribed by other doctors, (iii) AI suggestion and (iv) XAI suggestion. Findings: Among 86 ICU doctors we had four key findings. First, additional information (peer, AI or XAI) had a strong influence on prescriptions (significantly for AI, not so for peers) but XAI did not have higher influence than AI alone. Second, inter-clinician prescription variability was affected differentially according to whether the recommendation (whether peer, AI or XAI) was higher or lower than what subjects in the baseline arm did. Third, there was no correlation between attitudes to AI or clinical experience on the AI-supported decisions. Fourth, there was no correlation between what doctors self-reported about how useful they found the XAI and whether the XAI actually influenced their prescriptions. Interpretation: Taken together, our findings on a comparatively large clinical expert population raise important questions for the meaning and design of medical XAI systems. Specifically, we show that the marginal impact of XAI as currently formulated is low in a medical population. We also cast doubt on the utility of self-reports as a valid metric for assessing XAI in clinical experts vs our more objective behavioural response paradigm. Further work in this area could look to higher fidelity and more granular markers that assess the natural behaviour of clinicians when they interact with decision support tools.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.