2014
DOI: 10.48550/arxiv.1405.6757
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Proximal Reinforcement Learning: A New Theory of Sequential Decision Making in Primal-Dual Spaces

Abstract: Reinforcement learning is a simple, and yet, comprehensive theory of learning that simultaneously models the adaptive behavior of artificial agents, such as robots and autonomous software programs, as well as attempts to explain the emergent behavior of biological systems. It also gives rise to computational ideas that provide a powerful tool to solve problems involving sequential prediction and decision making. Temporal difference learning is the most widely used method to solve reinforcement learning problem… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
29
0

Year Published

2017
2017
2023
2023

Publication Types

Select...
4
3
1

Relationship

3
5

Authors

Journals

citations
Cited by 23 publications
(31 citation statements)
references
References 93 publications
(146 reference statements)
0
29
0
Order By: Relevance
“…The line of research reported here has much in common with work on proximal reinforcement learning [Mahadevan et al, 2014], which explores first-order reinforcement learning algorithms using mirror maps [Bubeck, 2014;Juditsky et al, 2008] to construct primal-dual spaces. This work began originally with a dual space formulation of first-order sparse TD learning .…”
Section: Antosmentioning
confidence: 99%
See 2 more Smart Citations
“…The line of research reported here has much in common with work on proximal reinforcement learning [Mahadevan et al, 2014], which explores first-order reinforcement learning algorithms using mirror maps [Bubeck, 2014;Juditsky et al, 2008] to construct primal-dual spaces. This work began originally with a dual space formulation of first-order sparse TD learning .…”
Section: Antosmentioning
confidence: 99%
“…A sparse off-policy GTD2 algorithm with regularized dual averaging is introduced by Qin and Li [2014]. These studies provide different approaches to formulating the problem, first as a variational inequality problem [Juditsky et al, 2008;Mahadevan et al, 2014] or as a linear inverse problem , or as a quadratic objective function (MSPBE) using two-time-scale solvers [Qin and Li, 2014]. In this paper, we are going to explore the true nature of the GTD algorithms as stochastic gradient algorithm w.r.t the convex-concave saddle-point formulations of NEU and MSPBE.…”
Section: Antosmentioning
confidence: 99%
See 1 more Smart Citation
“…Introducing regularization in the GTD objective is not new. Mahadevan et al (2014) introduce the proximal GTD learning framework to integrate GTD algorithms with first-order optimization-based regularization via saddle-point formulations and proximal operators. Yu (2017) introduces a general regularization term for improving robustness.…”
Section: Gradient Emphasis Learningmentioning
confidence: 99%
“…Here, the double sampling issue means the requirement of double samples of the next stats from the current state to obtain an unbiased stochastic estimate of gradients of the objective mainly due to its quadratic nonlinearity. Alternatively, [28], [39] get around this difficulty by resorting to min-max reformulations of the MSBE and MSBPE and introduce primal-dual type methods for policy evaluation with finite sample analysis. Similar ideas have also been employed for policy optimization based on the (softmax) Bellman optimality equation; see, e.g., [34] (called Smoothed Bellman Error Embedding (SBEED) algorithm).…”
Section: B Modern Optimization-based Rl Algorithmsmentioning
confidence: 99%