Offline Policy Evaluation for Learning-based Deep Brain Stimulation Controllers

Gao, Qitong; Schmidt, Stephen L.; Kamaravelu, Karthik; Turner, Dennis A.; Grill, Warren M.; Pajić, Miroslav

doi:10.1109/iccps54341.2022.00014

Cited by 5 publications

(16 citation statements)

References 31 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Moreover, RSA and the branching can lead to increased expressiveness and robustness, such that future states and rewards are predicted accurately. There also exist OPE methods proposed toward specific applications (Chen et al, 2022;Saito et al, 2021;Gao et al, 2023;2022b).…”

Section: Related Workmentioning

confidence: 99%

“…We focus on using the VLBM to facilitate OPE since it allows to better distinguish the improvements made upon learning dynamics underlying the MDP used for estimating policy returns, as opposed to RL training where performance can be affected by multiple factors, e.g., techniques used for exploration and policy optimization. Moreover, model-based OPE methods is helpful for evaluating the safety and efficacy of RL-based controllers before deployments in the real world (Gao et al, 2022b), e.g., how a surgical robot would react to states that are critical to a successful procedure.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Variational Latent Branching Model for Off-Policy Evaluation

Gao,

Chi

et al. 2023

Preprint

View full text Add to dashboard Cite

Model-based methods have recently shown great potential for off-policy evaluation (OPE); offline trajectories induced by behavioral policies are fitted to transitions of Markov decision processes (MDPs), which are used to rollout simulated trajectories and estimate the performance of policies. Model-based OPE methods face two key challenges. First, as offline trajectories are usually fixed, they tend to cover limited state and action space. Second, the performance of model-based methods can be sensitive to the initialization of their parameters. In this work, we propose the variational latent branching model (VLBM) to learn the transition function of MDPs by formulating the environmental dynamics as a compact latent space, from which the next states and rewards are then sampled. Specifically, VLBM leverages and extends the variational inference framework with the recurrent state alignment (RSA), which is designed to capture as much information underlying the limited training data, by smoothing out the information flow between the variational (encoding) and generative (decoding) part of VLBM. Moreover, we also introduce the branching architecture to improve the model's robustness against randomly initialized model weights. The effectiveness of the VLBM is evaluated on the deep OPE (DOPE) benchmark, from which the training trajectories are designed to result in varied coverage of the state-action space. We show that the VLBM outperforms existing state-of-the-art OPE methods in general.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Variational Latent Branching Model for Off-Policy Evaluation

Gao,

Chi

et al. 2023

Preprint

View full text Add to dashboard Cite

show abstract

“…In this section, we first introduce DBS, before presenting in the next section the DBS experimental setup we developed for clinical trials, including sensing, communication and control. Also, preliminaries for offline RL and OPE are briefly introduced; more comprehensive reviews of RL and OPE can be found in [18,19,37,56].…”

Section: Preliminaries and Motivationmentioning

confidence: 99%

“…Physiologically, the effect of PD can be captured by the changes in LFPs in GPi, GPe and STN. Specifically, PD can cause abnormal neuron firings in these regions, and lead to increased beta-band (13-35 Hz) amplitude (𝑃 𝛽 ), referred to as the beta amplitude, of the LFPs [19].…”

Section: The Need For Closed-loop Dbsmentioning

confidence: 99%

“…Note that each of the latter three criteria is only evaluated once at the end of each trial; yet they are imperative for evaluating the control efficacy from the patient's side. These efficacy metrics are thus considered sparsely available compared to the LFPs that can be sensed in each time step, which limits the use of existing OPE methods, including importance sampling (IS) [15,53], distributional correction estimations (DICE) [45], and the model-based OPE [19], as these do not allow for explicitly capturing/modeling such end-of-session rewards. Our OPE method can capture such behaviors through a specially designed architecture and training objective, outperforming existing methods as we show in clinical experiments.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Offline Learning of Closed-Loop Deep Brain Stimulation Controllers for Parkinson Disease Treatment

Gao,

Schimdt,

Chowdhury

et al. 2023

Preprint

View full text Add to dashboard Cite

Deep brain stimulation (DBS) has shown great promise toward treating motor symptoms caused by Parkinson's disease (PD), by delivering electrical pulses to the Basal Ganglia (BG) region of the brain. However, DBS devices approved by the U.S. Food and Drug Administration (FDA) can only deliver continuous DBS (cDBS) stimuli at a fixed amplitude; this energy inefficient operation reduces battery lifetime of the device, cannot adapt treatment dynamically for activity, and may cause significant side-effects (e.g., gait impairment). In this work, we introduce an offline reinforcement learning (RL) framework, allowing the use of past clinical data to train an RL policy to adjust the stimulation amplitude in real time, with the goal of reducing energy use while maintaining the same level of treatment (i.e., control) efficacy as cDBS. Moreover, clinical protocols require the safety and performance of such RL controllers to be demonstrated ahead of deployments in patients. Thus, we also introduce an offline policy evaluation (OPE) method to estimate the performance of RL policies using historical data, before deploying them on patients. We evaluated our framework on four PD patients equipped with the RC+S DBS system, employing the RL controllers during monthly clinical visits, with the overall control efficacy evaluated by severity of symptoms (i.e., bradykinesia and tremor), changes in PD biomakers (i.e., local field potentials), and patient ratings. The results from clinical experiments show that our RL-based controller maintains the same level of control efficacy as cDBS, but with significantly reduced stimulation energy. Further, the OPE method is shown effective in accurately estimating and ranking the expected returns of RL controllers.

show abstract