2005
DOI: 10.1007/s10479-005-5732-z
|View full text |Cite
|
Sign up to set email alerts
|

Basis Function Adaptation in Temporal Difference Reinforcement Learning

Abstract: Reinforcement Learning (RL) is an approach for solving complex multi-stage decision problems that fall under the general framework of Markov Decision Problems (MDPs), with possibly unknown parameters. Function approximation is essential for problems with a large state space, as it facilitates compact representation and enables generalization. Linear approximation architectures (where the adjustable parameters are the weights of pre-fixed basis functions) have recently gained prominence due to efficient algorit… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
115
0

Year Published

2005
2005
2014
2014

Publication Types

Select...
5
4
1

Relationship

1
9

Authors

Journals

citations
Cited by 151 publications
(115 citation statements)
references
References 22 publications
(26 reference statements)
0
115
0
Order By: Relevance
“…The CE method has been successfully applied to a diverse range of estimation and optimization problems, including buffer allocation [1], queueing models of telecommunication systems [14,16], optimal control of HIV/AIDS spread [48,49], signal detection [30], combinatorial auctions [9], DNA sequence alignment [24,38], scheduling and vehicle routing [3,8,11,20,23,53], neural and reinforcement learning [31,32,34,52,54], project management [12], rare-event simulation with light-and heavy-tail distributions [2,10,21,28], clustering analysis [4,5,29]. Applications to classical combinatorial optimization problems including the max-cut, traveling salesman, and Hamiltonian cycle 1…”
Section: Introductionmentioning
confidence: 99%
“…The CE method has been successfully applied to a diverse range of estimation and optimization problems, including buffer allocation [1], queueing models of telecommunication systems [14,16], optimal control of HIV/AIDS spread [48,49], signal detection [30], combinatorial auctions [9], DNA sequence alignment [24,38], scheduling and vehicle routing [3,8,11,20,23,53], neural and reinforcement learning [31,32,34,52,54], project management [12], rare-event simulation with light-and heavy-tail distributions [2,10,21,28], clustering analysis [4,5,29]. Applications to classical combinatorial optimization problems including the max-cut, traveling salesman, and Hamiltonian cycle 1…”
Section: Introductionmentioning
confidence: 99%
“…One class of methods aims at constructing a parsimonious set of features (basis functions). These include tuning the parameter of Gaussian RBF either using a gradient-or the cross-entropymethod in the context of LSTD (Menache et al, 2005), deriving new basis functions with nonparametric techniques (Keller et al, 2006;Parr et al, 2007) or using a combination of numerical analysis and nonparametric techniques (Mahadevan, 2009). These methods, however, do not attempt to control the tradeoff between the approximation and estimation errors.…”
Section: The Choice Of the Function Spacementioning
confidence: 99%
“…Various methods have been developed for adaptively constructing a basis function, most of which use the radial basis function (RBF) [4] and adjust the parameters of RBF [11]. However, orthonormal bases are superior to non-orthogonal bases such as RBF from the viewpoint of the trade-off between N and the approximation error [12].…”
Section: Reinforcement Learningmentioning
confidence: 99%