Basis Function Adaptation in Temporal Difference Reinforcement Learning

Menache, Ishai; Mannor, Shie; Shimkin, Nahum

doi:10.1007/s10479-005-5732-z

Cited by 151 publications

(115 citation statements)

References 22 publications

(26 reference statements)

Supporting

Mentioning

115

Contrasting

Order By: Relevance

“…The CE method has been successfully applied to a diverse range of estimation and optimization problems, including buffer allocation [1], queueing models of telecommunication systems [14,16], optimal control of HIV/AIDS spread [48,49], signal detection [30], combinatorial auctions [9], DNA sequence alignment [24,38], scheduling and vehicle routing [3,8,11,20,23,53], neural and reinforcement learning [31,32,34,52,54], project management [12], rare-event simulation with light-and heavy-tail distributions [2,10,21,28], clustering analysis [4,5,29]. Applications to classical combinatorial optimization problems including the max-cut, traveling salesman, and Hamiltonian cycle 1…”

Section: Introductionmentioning

confidence: 99%

A Tutorial on the Cross-Entropy Method

et al. 2005

Self Cite

View full text Add to dashboard Cite

Abstract:The cross-entropy method is a recent versatile Monte Carlo technique. This article provides a brief introduction to the cross-entropy method and discusses how it can be used for rare-event probability estimation and for solving combinatorial, continuous, constrained and noisy optimization problems. A comprehensive list of references on cross-entropy methods and applications is included.Keywords: cross-entropy, Kullback-Leibler divergence, rare events, importance sampling, stochastic search.The cross-entropy (CE) method is a recent generic Monte Carlo technique for solving complicated simulation and optimization problems. The approach was introduced by R.Y. Rubinstein in [41,42], extending his earlier work on variance minimization methods for rare-event probability estimation [40].The CE method can be applied to two types of problem:, where X is a random variable or vector taking values in some set X and H is function on X . An important special case is the estimation of a probability = P(S(X) γ), where S is another function on X .2. Optimization: Optimize (that is, maximize or minimize) S(x) over all x ∈ X , where S is some objective function on X . S can be either a known or a noisy function. In the latter case the objective function needs to be estimated, e.g., via simulation.In the estimation setting, the CE method can be viewed as an adaptive importance sampling procedure that uses the cross-entropy or Kullback-Leibler divergence as a measure of closeness between two sampling distributions, as is explained further in Section 1. In the optimization setting, the optimization problem is first translated into a rare-event estimation problem, and then the CE method for estimation is used as an adaptive algorithm to locate the optimum, as is explained further in Section 2.An easy tutorial on the CE method is given in [15]. A more comprehensive treatment can be found in [45]; see also [46, Chapter 8]. The CE method homepage can be found at www.cemethod.org .The CE method has been successfully applied to a diverse range of estimation and optimization problems, including buffer allocation [1], queueing models of telecommunication systems [14,16], optimal control of HIV/AIDS spread [48,49], signal detection [30], combinatorial auctions [9], DNA sequence alignment [24,38], scheduling and vehicle routing [3,8,11,20,23,53], neural and reinforcement learning [31,32,34,52,54], project management [12], rare-event simulation with light-and heavy-tail distributions [2,10,21,28], clustering analysis [4,5,29]. Applications to classical combinatorial optimization problems including the max-cut, traveling salesman, and Hamiltonian cycle 1

show abstract

Section: Introductionmentioning

confidence: 99%

A Tutorial on the Cross-Entropy Method

et al. 2005

Self Cite

View full text Add to dashboard Cite

show abstract

“…One class of methods aims at constructing a parsimonious set of features (basis functions). These include tuning the parameter of Gaussian RBF either using a gradient-or the cross-entropymethod in the context of LSTD (Menache et al, 2005), deriving new basis functions with nonparametric techniques (Keller et al, 2006;Parr et al, 2007) or using a combination of numerical analysis and nonparametric techniques (Mahadevan, 2009). These methods, however, do not attempt to control the tradeoff between the approximation and estimation errors.…”

Section: The Choice Of the Function Spacementioning

confidence: 99%

Algorithms for Reinforcement Learning

Szepesvári¹

2010

Synthesis Lectures on Artificial Intelligence and Machine Learn

604

406

View full text Add to dashboard Cite

“…Various methods have been developed for adaptively constructing a basis function, most of which use the radial basis function (RBF) [4] and adjust the parameters of RBF [11]. However, orthonormal bases are superior to non-orthogonal bases such as RBF from the viewpoint of the trade-off between N and the approximation error [12].…”

Section: Reinforcement Learningmentioning

confidence: 99%

A Nonlinear Approach to Robust Routing Based on Reinforcement Learning with State Space Compression and Adaptive Basis Construction

Satoh

2008

IEICE Transactions on Fundamentals of Electronics, Communicatio

View full text Add to dashboard Cite

SUMMARY A robust routing algorithm was developed based on reinforcement learning that uses (1) reward-weighted principal component analysis, which compresses the state space of a network with a large number of nodes and eliminates the adverse effects of various types of attacks or disturbance noises, (2) activity-oriented index allocation, which adaptively constructs a basis that is used for approximating routing probabilities, and (3) newly developed control space compression based on a potential model that reduces the control space for routing probabilities. This algorithm takes all the network states into account and reduces the adverse effects of disturbance noises. The algorithm thus works well, and the frequencies of causing routing loops and falling to a local optimum are reduced even if the routing information is disturbed.

show abstract

Basis Function Adaptation in Temporal Difference Reinforcement Learning

Cited by 151 publications

References 22 publications

A Tutorial on the Cross-Entropy Method

A Tutorial on the Cross-Entropy Method

Algorithms for Reinforcement Learning

A Nonlinear Approach to Robust Routing Based on Reinforcement Learning with State Space Compression and Adaptive Basis Construction

Contact Info

Product

Resources

About