Reinforcement Learning and Approximate Dynamic Programming for Feedback Control 2012
DOI: 10.1002/9781118453988.ch23
|View full text |Cite
|
Sign up to set email alerts
|

Adaptive Feature Pursuit: Online Adaptation of Features in Reinforcement Learning

Abstract: We present a novel feature adaptation scheme based on temporal difference learning for the problem of prediction. The scheme suitably combines aspects of exploitation and exploration by (a) finding the worst basis vector in the feature matrix at each stage and replacing it with the current best estimate of the normalized value function, and (b) replacing the second worst basis vector with another vector chosen randomly that would result in a new subspace of basis vectors getting picked. We

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2014
2014
2020
2020

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(1 citation statement)
references
References 23 publications
0
1
0
Order By: Relevance
“…Recently, there have been efforts to automate the feature selection in RL. These researchers include performing a gradient descent on manifold of features in the direction of minimising of mean square Bellman error [6, 7], tuning the parameters of feature using a gradient‐based or a cross‐entropy‐based method [8, 9], constricting features with nonparametric techniques [10–13], and expanding the set of features by using TD error [14]. However, these methods do not consider the tradeoff between the approximation error and estimation error.…”
Section: Introductionmentioning
confidence: 99%
“…Recently, there have been efforts to automate the feature selection in RL. These researchers include performing a gradient descent on manifold of features in the direction of minimising of mean square Bellman error [6, 7], tuning the parameters of feature using a gradient‐based or a cross‐entropy‐based method [8, 9], constricting features with nonparametric techniques [10–13], and expanding the set of features by using TD error [14]. However, these methods do not consider the tradeoff between the approximation error and estimation error.…”
Section: Introductionmentioning
confidence: 99%