Personalized vital signs control based on continuous action-space reinforcement learning with supervised experience

Sun, C. P.; Hong, Shenda; Song, Moxian; Shang, Junyuan; Li, Hongyan

doi:10.1016/j.bspc.2021.102847

Cited by 10 publications

(18 citation statements)

References 32 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Because only the patient's survival is concerned, the reward is observed after a long sequence of decisions. We also apply intermediate rewards and final reward in the form of SOFA change and survival after 90 days respectively 25 . SOFA represents the evidence of organ dysfunction and has been recommended by experts as a screening tool for sepsis 34 .…”

Section: Methodsmentioning

confidence: 99%

“…Off-policy evaluation. In experiments, we use the intermediate reward parameter 𝛽 𝑠 = 0.6 and the terminal reward parameter 𝛽 𝑇 = 24, following the setting in existing works 25 . Specifically, the terminal reward is 24 if the patient survives, otherwise -24.…”

Section: Methodsmentioning

confidence: 99%

“…Then, we calculate the average survival rate based on the return value. The survival formula 25 is shown below:…”

Section: Methodsmentioning

confidence: 99%

“…Nonetheless, a DRL agent only interacts with the environment to seek the optimal actions with high reward, regardless of the potential risks. It has been noted that certain actions induced by AI could cause high risk and lead to contentious medical solutions 25 , which has significantly stymied the broad adoption of AI in healthcare management. On the other front, human experts maintain an edge over AI in abstract reasoning under ambiguous conditions.…”

Section: Q Value Functionmentioning

confidence: 99%

“…Remarkably, AI or data-driven models suffer from biases in data and model building, and consequently may provoke treatment solutions that are against the principle of clinic practices. To this end, hybrid systems of SL and RL that capitalize on the availability of large-scale EMR have been proposed, which are capable of providing reliable medical recommendations 24,25 . Nevertheless, the usage of SL not only increases the computational complexity but also limits the self-adaptiveness of the RL decision in long-term reward 26 .…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

WD3QNE: A value-based deep reinforcement learning model with human expertise in optimal treatment of sepsis

et al. 2022

Preprint

View full text Add to dashboard Cite

Deep Reinforcement Learning (DRL) has been increasingly attempted in assisting clinicians for real-time treatment of sepsis. While a value function quantifies the performance of policies in such decision-making processes, most value-based DRL algorithms cannot evaluate the target value function precisely and are not as safe as clinical experts. In this study, we propose a Weighted Dueling Double Deep Q-Network with embedded human Expertise (WD3QNE) that incorporates a target Q value function with adaptive dynamic weight for enhanced policy improvement and human expertise in decision-making for sepsis treatment. In addition, the random forest algorithm is employed for feature selection to improve model interpretability. We test our algorithm against state-of-the-art value function methods in terms of expected return, survival rate and action distribution. The results demonstrate that WD3QNE obtains the highest survival rate of 97.81%. Our proposed method is capable of providing reliable treatment decisions with embedded clinician expertise.

show abstract

Section: Methodsmentioning

confidence: 99%

Section: Methodsmentioning

confidence: 99%

“…Then, we calculate the average survival rate based on the return value. The survival formula 25 is shown below:…”

Section: Methodsmentioning

confidence: 99%

Section: Q Value Functionmentioning

confidence: 99%