2016
DOI: 10.1145/2868723
|View full text |Cite
|
Sign up to set email alerts
|

Actor-Critic Algorithms with Online Feature Adaptation

Abstract: We develop two new online actor-critic control algorithms with adaptive feature tuning for Markov Decision Processes (MDPs). One of our algorithms is proposed for the long-run average cost objective, while the other works for discounted cost MDPs. Our actor-critic architecture incorporates parameterization both in the policy and the value function. A gradient search in the policy parameters is performed to improve the performance of the actor. The computation of the aforementioned gradient, however, requires a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
9
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
4
2
1

Relationship

2
5

Authors

Journals

citations
Cited by 7 publications
(9 citation statements)
references
References 27 publications
0
9
0
Order By: Relevance
“…Limit Theorem As noted by [6], the fact that the deterministic gradient is a limit case of the stochastic gradient enables the standard machinery of policy gradient, such as compatible-function approximation ( [13]), natural gradients ( [14]), on-line feature adaptation ( [15],) and actor-critic ( [16]) to be used with deterministic policies. We show that it holds in our setting.…”
Section: Local Gradients Of Deterministic Policiesmentioning
confidence: 99%
See 1 more Smart Citation
“…Limit Theorem As noted by [6], the fact that the deterministic gradient is a limit case of the stochastic gradient enables the standard machinery of policy gradient, such as compatible-function approximation ( [13]), natural gradients ( [14]), on-line feature adaptation ( [15],) and actor-critic ( [16]) to be used with deterministic policies. We show that it holds in our setting.…”
Section: Local Gradients Of Deterministic Policiesmentioning
confidence: 99%
“…Consider the following on-policy algorithm. The actor step is based on an expression for ∇ θ i J(µ θ ) in terms of ∇ a i Q θ (see Equation (15) in the Appendix). We approximate the action-value function Q θ using a family of functions Qω : S × A → R parameterized by ω, a column vector in R K .…”
Section: On-policy Deterministic Actor-criticmentioning
confidence: 99%
“…Actor-critic algorithms [11], [24] are a popular class of RL algorithms that utilise the policy gradient theorem to compute the optimal policy π θ * . Here, the critic estimates the advantage value function (x) parameters and the actor uses it to improves the policy parameters (θ).…”
Section: On-policy Actor-critic Algorithmsmentioning
confidence: 99%
“…However, they explicitly compute/approximate a distance metric using the Fisher information matrix (curvature information) giving rise to complex algorithms. This estimation/approximation of the Fisher information matrix can be entirely avoided (see equation (11) of Section III) if the critic employs a linear function approximator with well-defined features known as compatible features [14].…”
Section: Introductionmentioning
confidence: 99%
“…Recently, there have been efforts to automate the feature selection in RL. These researchers include performing a gradient descent on manifold of features in the direction of minimising of mean square Bellman error [6, 7], tuning the parameters of feature using a gradient‐based or a cross‐entropy‐based method [8, 9], constricting features with nonparametric techniques [10–13], and expanding the set of features by using TD error [14]. However, these methods do not consider the tradeoff between the approximation error and estimation error.…”
Section: Introductionmentioning
confidence: 99%