Reinforcement Learning and Approximate Dynamic Programming for Feedback Control 2012
DOI: 10.1002/9781118453988.ch24
|View full text |Cite
|
Sign up to set email alerts
|

Feature Selection for Neuro‐Dynamic Programming

Abstract: Neuro-Dynamic Programming encompasses techniques from both reinforcement learning and approximate dynamic programming. Feature selection refers to the choice of basis that defines the function class that is required in the application of these techniques.This chapter reviews two popular approaches to neuro-dynamic programming, TDlearning and Q-learning. The main goal of the chapter is to demonstrate how insight from idealized models can be used as a guide for feature selection for these algorithms. Several app… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2012
2012
2019
2019

Publication Types

Select...
6

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(4 citation statements)
references
References 24 publications
0
4
0
Order By: Relevance
“…Common elements in all of our experiments are a linear parameterization for the value function, and the implementation of the ∇-LSTD algorithm. Comparisons with other approaches include the standard LSTD algorithm for discounted cost, and the regenerative LSTD algorithm of [1,8] for average cost applications where there is regeneration. The standard TD(λ) algorithm was also considered, but in each example the variance was found to be several orders of magnitude greater than alternatives.…”
Section: Simulation Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…Common elements in all of our experiments are a linear parameterization for the value function, and the implementation of the ∇-LSTD algorithm. Comparisons with other approaches include the standard LSTD algorithm for discounted cost, and the regenerative LSTD algorithm of [1,8] for average cost applications where there is regeneration. The standard TD(λ) algorithm was also considered, but in each example the variance was found to be several orders of magnitude greater than alternatives.…”
Section: Simulation Resultsmentioning
confidence: 99%
“…Consider the linear parameterization (8) in which ψ : R d → R d is continuously differentiable, and assume as well that c is continuously differentiable. The ∇-LSTD learning algorithm is then defined by the following recursion: Differential least squares TD-learning algorithm…”
Section: Lstd Algorithmsmentioning
confidence: 99%
“…The upshot of stochastic approximation is that it can be implemented without knowledge of the function f or of the distribution of ξ; rather, it can rely on observations of the sequence {f (θ n , ξ n )}. This is one reason why these algorithms are valuable in the context of reinforcement learning (RL) [2], [6], [7], [8], [9]. In such cases, the driving noise is typically modeled as a Markov chain.…”
Section: Introduction and Proposed Frameworkmentioning
confidence: 99%
“…The Bellman error is included as an additional basis in each iteration, thereby increasing the dimension of the subspace at each iteration. In [14], an a priori approximation of the value function based on a simplified model as one of the features has been considered. In [1], an approach that makes use of both state aggregation and linear function approximation is presented.…”
Section: Introductionmentioning
confidence: 99%