This paper develops an off-policy reinforcement learning (RL) algorithm to solve optimal synchronization of multiagent systems. This is accomplished by using the framework of graphical games. In contrast to traditional control protocols, which require complete knowledge of agent dynamics, the proposed off-policy RL algorithm is a model-free approach, in that it solves the optimal synchronization problem without knowing any knowledge of the agent dynamics. A prescribed control policy, called behavior policy, is applied to each agent to generate and collect data for learning. An off-policy Bellman equation is derived for each agent to learn the value function for the policy under evaluation, called target policy, and find an improved policy, simultaneously. Actor and critic neural networks along with least-square approach are employed to approximate target control policies and value functions using the data generated by applying prescribed behavior policies. Finally, an off-policy RL algorithm is presented that is implemented in real time and gives the approximate optimal control policy for each agent using only measured data. It is shown that the optimal distributed policies found by the proposed algorithm satisfy the global Nash equilibrium and synchronize all agents to the leader. Simulation results illustrate the effectiveness of the proposed method.
In this paper, a novel off-policy interleaved Q-learning algorithm is presented for solving optimal control problem of affine nonlinear discrete-time (DT) systems, using only the measured data along the system trajectories. Affine nonlinear feature of systems, unknown dynamics and off-policy learning approach pose tremendous challenges on approximating optimal controllers. To this end, on-policy Q-learning method for optimal control of affine nonlinear DT systems is reviewed first, and its convergence is rigorously proven. The bias of solution to Q-function based Bellman equation caused by adding probing noises to systems for satisfying persistent excitation is also analyzed when using on-policy Q-learning approach. Then, a behavior control policy is introduced followed by proposing an off-policy Q-learning algorithm. Meanwhile, the convergence of algorithm and no bias of solution to optimal control problem when adding probing noise to systems are investigated. Third, three neural networks run by interleaved Q-learning approach in the actor-critic framework. Thus, a novel off-policy interleaved Q-learning algorithm is derived and its convergence is proven. Simulation results are given to verify the effectiveness of the proposed method.Index Terms-Q-learning, off-policy learning, affine nonlinear systems, interleaved learning, optimal control.
BackgroundTo assess the efficacy of intraoperative ultrasound-guided implantation of 125I seeds for the treatment of unresectable pancreatic carcinoma, and analyze the associated prognostic factors.MethodsTwenty-eight patients with pancreatic carcinoma who underwent laparotomy and were considered to have unresectable tumors were included in this study. Nine patients were pathologically diagnosed with Stage II disease, and nineteen patients with Stage III disease. Twenty-eight patients received intraoperative ultrasound-guided 125I seed implantation and received a D90 (at least 90% of the tumor volume received the reference dose) ranging from 60 to 163 Gy, with a median of 120 Gy. Seven patients received an additional 35–50 Gy external beam radiotherapy after seed implantation, and ten patients received two to ten cycles of chemotherapy. Overall survival of the patients was calculated and prognostic factors were evaluated.ResultsOf the patients, 94.1% (16/17) achieved good to medium pain relief. The tumor response rate was 78.6% (22/28), and local control was achieved in 85.7% (24/28) of patients. The 1-, 2- and 3-year survival rates were 30%, 11% and 4%, and the median survival was 10.1 months (95% CI: 9.0-10.9). Analysis using the Cox proportional hazards model suggested that patients younger than 60 years and patients who received a D90 higher than 110 Gy may survive for a longer period.ConclusionsI seed implantation provides a safe and effective method to relieve pain, control local tumor growth and, to some extent, prolong the survival of patients with stage II and III pancreatic disease, without additional complications. Age and accumulated dose may be factors predictive of a favorable outcome for patients with unresectable pancreatic carcinoma treated with 125I seeds. These findings need to be validated by conducting further studies with larger cohorts.
Background: To assess the feasibility and efficacy of using 125 I seed implantation under intraoperative ultrasound guidance for unresectable pancreatic carcinoma.
BackgroundThe management of pediatric recurrent or metastatic soft tissue sarcoma after multimodal treatment remains challenging. We investigated the feasibility, efficacy, and morbidity of permanent interstitial 125I seed implantation under image guidance as a salvage treatment for pediatric patients with recurrent or metastatic soft tissue sarcoma.MethodsThis was a retrospective study of 10 patients who underwent percutaneous ultrasound or computed tomography (CT) guided permanent 125I seed implantation. Postoperative dosimetry was performed for all patients. Actuarial D90 was 121–187.1 Gy (median, 170.3 Gy). The number of 125I seeds implanted was 6–158 (median, 34.5), with a median specific activity of 0.7 mCi per seed (range, 0.62–0.8 mCi); total activity was 4.2–113.76 mCi. Follow-up time was 6–107 months (median, 27.5 months); no patients were lost to follow-up.ResultsThe overall response rate (complete response + partial response) was 8/10 (80 %), including two patients with complete response (CR) (20 %) and five patients with partial response (PR) (60 %). Local control rates after 1 and 2 years were 70.1 and 62.3 %, respectively, with a mean local control time of 70.6 months (95 % confidence interval (CI) 45.1–96.0). Survival rates after 1 and 2 years were 68.6 and 57.1 %, respectively, with a mean survival time of 65.3 months (95 % CI 34.1–96.5). Three patients died from distant metastasis; one died from local recurrence 12 months after seed implantation. Three patients suffered a grade I skin reaction and one developed ulceration. No severe adverse neurologic sequelae or blood vessel damage occurred.ConclusionsImage guided permanent interstitial 125I seed implantation as a salvage treatment appears to have a satisfactory outcome in children with recurrent or metastatic soft tissue sarcoma.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.