Wenhao Zhan scite author profile

Policy optimization, which learns the policy of interest by maximizing the value function via large-scale optimization techniques, lies at the heart of modern reinforcement learning (RL). In addition to value maximization, other practical considerations arise commonly as well, including the need of encouraging exploration, and that of ensuring certain structural properties of the learned policy due to safety, resource and operational constraints. These considerations can often be accounted for by resorting to regularized RL, which augments the target value function with a structure-promoting regularization term.Focusing on an infinite-horizon discounted Markov decision process, this paper proposes a generalized policy mirror descent (GPMD) algorithm for solving regularized RL. As a generalization of policy mirror descent (Lan, 2021), the proposed algorithm accommodates a general class of convex regularizers as well as a broad family of Bregman divergence in cognizant of the regularizer in use. We demonstrate that our algorithm converges linearly over an entire range of learning rates, in a dimension-free fashion, to the global solution, even when the regularizer lacks strong convexity and smoothness. In addition, this linear convergence feature is provably stable in the face of inexact policy evaluation and imperfect policy updates. Numerical experiments are provided to corroborate the applicability and appealing performance of GPMD.

show abstract

Stabilization of heavy metal-contaminated soils by biochar: Challenges and recommendations

Wang

Liu

Zhan

et al. 2020

Science of The Total Environment

216

View full text Add to dashboard Cite

Highly effective stabilization of Cd and Cu in two different soils and improvement of soil properties by multiple-modified biochar

Wang

Zheng

Zhan

et al. 2021

Ecotoxicology and Environmental Safety

106

View full text Add to dashboard Cite

Long-term stabilization of Cd in agricultural soil using mercapto-functionalized nano-silica (MPTS/nano-silica): A three-year field study

Wang

Liu

Zhan

et al. 2020

Ecotoxicology and Environmental Safety

View full text Add to dashboard Cite

Offline Reinforcement Learning with Realizability and Single-policy Concentrability

Zhan¹,

Huang²,

Huang³

et al. 2022

Preprint

View full text Add to dashboard Cite

Sample-efficiency guarantees for offline reinforcement learning (RL) often rely on strong assumptions on both the function classes (e.g., Bellman-completeness) and the data coverage (e.g., all-policy concentrability). Despite the recent efforts on relaxing these assumptions, existing works are only able to relax one of the two factors, leaving the strong assumption on the other factor intact. As an important open problem, can we achieve sample-efficient offline RL with weak assumptions on both factors?In this paper we answer the question in the positive. We analyze a simple algorithm based on the primal-dual formulation of MDPs, where the dual variables (discounted occupancy) are modeled using a density-ratio function against offline data. With proper regularization, we show that the algorithm enjoys polynomial sample complexity, under only realizability and single-policy concentrability. We also provide alternative analyses based on different assumptions to shed light on the nature of primal-dual algorithms for offline RL.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Wenhao Zhan

Policy Mirror Descent for Regularized Reinforcement Learning: A Generalized Framework with Linear Convergence

Stabilization of heavy metal-contaminated soils by biochar: Challenges and recommendations

Highly effective stabilization of Cd and Cu in two different soils and improvement of soil properties by multiple-modified biochar

Long-term stabilization of Cd in agricultural soil using mercapto-functionalized nano-silica (MPTS/nano-silica): A three-year field study

Offline Reinforcement Learning with Realizability and Single-policy Concentrability

Contact Info

Product

Resources

About