49th IEEE Conference on Decision and Control (CDC) 2010
DOI: 10.1109/cdc.2010.5717385
|View full text |Cite
|
Sign up to set email alerts
|

Adaptive bases for Q-learning

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
7
0

Year Published

2013
2013
2023
2023

Publication Types

Select...
2
2

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(7 citation statements)
references
References 18 publications
0
7
0
Order By: Relevance
“…A key contribution of this paper is a general framework for nonlinear value function approximation based on nonlinear separable least-squares. This framework helps clarify previously proposed algorithms (Castro and Mannor 2010;Bertsekas and Yu 2009;Menache, Shimkin, and Mannor 2005), which amount to particular instances of this general framework. Concretely, basis adaptation can be understood as modifying a parameterized basis, Φ(α) where α denotes the nonlinear parameters, and w denotes the linear parameters, such that V ≈ Φ(α)w. For example, for radial basis function (RBF) bases, α would correspond to the mean (center), μ i , and covariance (width) parameters, Σ i , of a fixed set of bases, where these would be tuned based on optimizing some particular error, such as the Bellman error (Menache, Shimkin, and Mannor 2005).…”
Section: Nonlinear Value Function Approximationmentioning
confidence: 82%
See 4 more Smart Citations
“…A key contribution of this paper is a general framework for nonlinear value function approximation based on nonlinear separable least-squares. This framework helps clarify previously proposed algorithms (Castro and Mannor 2010;Bertsekas and Yu 2009;Menache, Shimkin, and Mannor 2005), which amount to particular instances of this general framework. Concretely, basis adaptation can be understood as modifying a parameterized basis, Φ(α) where α denotes the nonlinear parameters, and w denotes the linear parameters, such that V ≈ Φ(α)w. For example, for radial basis function (RBF) bases, α would correspond to the mean (center), μ i , and covariance (width) parameters, Σ i , of a fixed set of bases, where these would be tuned based on optimizing some particular error, such as the Bellman error (Menache, Shimkin, and Mannor 2005).…”
Section: Nonlinear Value Function Approximationmentioning
confidence: 82%
“…Like the approach proposed for adaptive bases for Qlearning (Castro and Mannor 2010), our method is based on the two-time scale stochastic approximation framework, whereby the linear parameters w are adapted at a faster timescale than the nonlinear parameters α. Algorithm 1 below describes the nonlinear adaptive basis mirror-descent variant of Watkins Q(λ) algorithm. 3 We indicate the dynamically varying nature of the bases as φ t (s t , a t ) where the subscript denotes the particular value of the nonlinear parameters α t at time t. In this section, we denote β t as the learning rate for the faster time-scale update procedure for updating the linear weights w t and ξ t as the learning rate for the slower time-scale parameter for updating the nonlinear basis parameters α t .…”
Section: Basis Adaptation With Mirror Descent Rlmentioning
confidence: 99%
See 3 more Smart Citations