Feature Selection for Neuro‐Dynamic Programming

Huang, Duruo; Chen, Wei; Mehta, Prashant G.; Meyn, Sean; Surana, Amit

doi:10.1002/9781118453988.ch24

Cited by 6 publications

(4 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Common elements in all of our experiments are a linear parameterization for the value function, and the implementation of the ∇-LSTD algorithm. Comparisons with other approaches include the standard LSTD algorithm for discounted cost, and the regenerative LSTD algorithm of [1,8] for average cost applications where there is regeneration. The standard TD(λ) algorithm was also considered, but in each example the variance was found to be several orders of magnitude greater than alternatives.…”

Section: Simulation Resultsmentioning

confidence: 99%

See 1 more Smart Citation

Differential TD learning for value function approximation

Devraj

Meyn

2016

2016 IEEE 55th Conference on Decision and Control (CDC)

View full text Add to dashboard Cite

Value functions arise as a component of algorithms as well as performance metrics in statistics and engineering applications. Computation of the associated Bellman equations is numerically challenging in all but a few special cases.A popular approximation technique is known as Temporal Difference (TD) learning. The algorithm introduced in this paper is intended to resolve two well-known problems with this approach: In the discounted-cost setting, the variance of the algorithm diverges as the discount factor approaches unity. Second, for the average cost setting, unbiased algorithms exist only in special cases.It is shown that the gradient of any of these value functions admits a representation that lends itself to algorithm design. Based on this result, the new differential TD method is obtained for Markovian models on Euclidean space with smooth dynamics.Numerical examples show remarkable improvements in performance. In application to speed scaling, variance is reduced by two orders of magnitude.

show abstract

Section: Simulation Resultsmentioning

confidence: 99%

“…Consider the linear parameterization (8) in which ψ : R d → R d is continuously differentiable, and assume as well that c is continuously differentiable. The ∇-LSTD learning algorithm is then defined by the following recursion: Differential least squares TD-learning algorithm…”

Section: Lstd Algorithmsmentioning

confidence: 99%

Differential TD learning for value function approximation

Devraj

Meyn

2016

2016 IEEE 55th Conference on Decision and Control (CDC)

View full text Add to dashboard Cite

show abstract

“…The upshot of stochastic approximation is that it can be implemented without knowledge of the function f or of the distribution of ξ; rather, it can rely on observations of the sequence {f (θ n , ξ n )}. This is one reason why these algorithms are valuable in the context of reinforcement learning (RL) [2], [6], [7], [8], [9]. In such cases, the driving noise is typically modeled as a Markov chain.…”

Section: Introduction and Proposed Frameworkmentioning

confidence: 99%

Optimal Rate of Convergence for Quasi-Stochastic Approximation

Bernstein¹,

Chen²,

Colombino³

et al. 2019

Preprint

View full text Add to dashboard Cite

The Robbins-Monro stochastic approximation algorithm is a foundation of many algorithmic frameworks for reinforcement learning (RL), and often an efficient approach to solving (or approximating the solution to) complex optimal control problems. However, in many cases practitioners are unable to apply these techniques because of an inherent high variance. This paper aims to provide a general foundation for "quasistochastic approximation," in which all of the processes under consideration are deterministic, much like quasi-Monte-Carlo for variance reduction in simulation. The variance reduction can be substantial, subject to tuning of pertinent parameters in the algorithm. This paper introduces a new coupling argument to establish optimal rate of convergence provided the gain is sufficiently large. These results are established for linear models, and tested also in non-ideal settings.A major application of these general results is a new class of RL algorithms for deterministic state space models. In this setting, the main contribution is a class of algorithms for approximating the value function for a given policy, using a different policy designed to introduce exploration.

show abstract

“…The Bellman error is included as an additional basis in each iteration, thereby increasing the dimension of the subspace at each iteration. In [14], an a priori approximation of the value function based on a simplified model as one of the features has been considered. In [1], an approach that makes use of both state aggregation and linear function approximation is presented.…”

Section: Introductionmentioning

confidence: 99%

Adaptive Feature Pursuit: Online Adaptation of Features in Reinforcement Learning

Lewis

Liu

2012

Reinforcement Learning and Approximate Dynamic Programming for Feedback Control

View full text Add to dashboard Cite

We present a novel feature adaptation scheme based on temporal difference learning for the problem of prediction. The scheme suitably combines aspects of exploitation and exploration by (a) finding the worst basis vector in the feature matrix at each stage and replacing it with the current best estimate of the normalized value function, and (b) replacing the second worst basis vector with another vector chosen randomly that would result in a new subspace of basis vectors getting picked. We

show abstract

Feature Selection for Neuro‐Dynamic Programming

Cited by 6 publications

References 24 publications

Differential TD learning for value function approximation

Differential TD learning for value function approximation

Optimal Rate of Convergence for Quasi-Stochastic Approximation

Adaptive Feature Pursuit: Online Adaptation of Features in Reinforcement Learning

Contact Info

Product

Resources

About