Deep brain stimulation (DBS) has shown great promise toward treating motor
symptoms caused by Parkinson's disease (PD), by delivering electrical pulses to
the Basal Ganglia (BG) region of the brain. However, DBS devices approved by
the U.S. Food and Drug Administration (FDA) can only deliver continuous DBS
(cDBS) stimuli at a fixed amplitude; this energy inefficient operation reduces
battery lifetime of the device, cannot adapt treatment dynamically for
activity, and may cause significant side-effects (e.g., gait impairment). In
this work, we introduce an offline reinforcement learning (RL) framework,
allowing the use of past clinical data to train an RL policy to adjust the
stimulation amplitude in real time, with the goal of reducing energy use while
maintaining the same level of treatment (i.e., control) efficacy as cDBS.
Moreover, clinical protocols require the safety and performance of such RL
controllers to be demonstrated ahead of deployments in patients. Thus, we also
introduce an offline policy evaluation (OPE) method to estimate the performance
of RL policies using historical data, before deploying them on patients. We
evaluated our framework on four PD patients equipped with the RC+S DBS system,
employing the RL controllers during monthly clinical visits, with the overall
control efficacy evaluated by severity of symptoms (i.e., bradykinesia and
tremor), changes in PD biomakers (i.e., local field potentials), and patient
ratings. The results from clinical experiments show that our RL-based
controller maintains the same level of control efficacy as cDBS, but with
significantly reduced stimulation energy. Further, the OPE method is shown
effective in accurately estimating and ranking the expected returns of RL
controllers.