Adaptive Feature Pursuit: Online Adaptation of Features in Reinforcement Learning

Lewis, Frank L.; Liu, Derong

doi:10.1002/9781118453988.ch23

Cited by 3 publications

(1 citation statement)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Recently, there have been efforts to automate the feature selection in RL. These researchers include performing a gradient descent on manifold of features in the direction of minimising of mean square Bellman error [6, 7], tuning the parameters of feature using a gradient‐based or a cross‐entropy‐based method [8, 9], constricting features with nonparametric techniques [10–13], and expanding the set of features by using TD error [14]. However, these methods do not consider the tradeoff between the approximation error and estimation error.…”

Section: Introductionmentioning

confidence: 99%

Feature selection in deterministic policy gradient

Song

2020

J. eng.

View full text Add to dashboard Cite

Section: Introductionmentioning

confidence: 99%

Feature selection in deterministic policy gradient

Song

2020

J. eng.

View full text Add to dashboard Cite

An actor critic algorithm based on Grassmanian search

Prabuchandran

Bhatnagar

Borkar

2014

53rd IEEE Conference on Decision and Control

View full text Add to dashboard Cite

Actor-Critic Algorithms with Online Feature Adaptation

Prabuchandran

Bhatnagar

Borkar

2016

ACM Trans. Model. Comput. Simul.

View full text Add to dashboard Cite

We develop two new online actor-critic control algorithms with adaptive feature tuning for Markov Decision Processes (MDPs). One of our algorithms is proposed for the long-run average cost objective, while the other works for discounted cost MDPs. Our actor-critic architecture incorporates parameterization both in the policy and the value function. A gradient search in the policy parameters is performed to improve the performance of the actor. The computation of the aforementioned gradient, however, requires an estimate of the value function of the policy corresponding to the current actor parameter. The value function, on the other hand, is approximated using linear function approximation and obtained from the critic. The error in approximation of the value function, however, results in suboptimal policies. In our article, we also update the features by performing a gradient descent on the Grassmannian of features to minimize a mean square Bellman error objective in order to find the best features. The aim is to obtain a good approximation of the value function and thereby ensure convergence of the actor to locally optimal policies. In order to estimate the gradient of the objective in the case of the average cost criterion, we utilize the policy gradient theorem, while in the case of the discounted cost objective, we utilize the simultaneous perturbation stochastic approximation (SPSA) scheme. We prove that our actor-critic algorithms converge to locally optimal policies. Experiments on two different settings show performance improvements resulting from our feature adaptation scheme.

show abstract

Adaptive Feature Pursuit: Online Adaptation of Features in Reinforcement Learning

Cited by 3 publications

References 23 publications

Feature selection in deterministic policy gradient

Feature selection in deterministic policy gradient

An actor critic algorithm based on Grassmanian search

Actor-Critic Algorithms with Online Feature Adaptation

Contact Info

Product

Resources

About