2014 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL) 2014
DOI: 10.1109/adprl.2014.7010633
|View full text |Cite
|
Sign up to set email alerts
|

Pseudo-MDPs and factored linear action models

Abstract: Abstract-In this paper we introduce the concept of pseudoMDPs to develop abstractions. Pseudo-MDPs relax the requirement that the transition kernel has to be a probability kernel. We show that the new framework captures many existing abstractions. We also introduce the concept of factored linear action models; a special case. Again, the relation of factored linear action models and existing works are discussed. We use the general framework to develop a theory for bounding the suboptimality of policies derived … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
21
0

Year Published

2016
2016
2023
2023

Publication Types

Select...
6
2

Relationship

2
6

Authors

Journals

citations
Cited by 11 publications
(22 citation statements)
references
References 15 publications
(12 reference statements)
1
21
0
Order By: Relevance
“…It estimates the value function directly from a sensed experience (Sutton and Barto 2018). On the other hand, the model-based RL approach uses an estimated transition function to compute the optimal policy (Yao and Szepesvári 2012;Yao et al 2014;Sutton and Barto 2018;Moerland, Broekens, and Jonker 2020). A model-based RL method usually has a planning component, which learns and uses a model to approximate value functions.…”
Section: Focusing On Model-based Reinforcement Learningmentioning
confidence: 99%
“…It estimates the value function directly from a sensed experience (Sutton and Barto 2018). On the other hand, the model-based RL approach uses an estimated transition function to compute the optimal policy (Yao and Szepesvári 2012;Yao et al 2014;Sutton and Barto 2018;Moerland, Broekens, and Jonker 2020). A model-based RL method usually has a planning component, which learns and uses a model to approximate value functions.…”
Section: Focusing On Model-based Reinforcement Learningmentioning
confidence: 99%
“…Online learning under this assumption has received substantial attention in the recent literature, and in particular has been shown to be satisfied in the class of so-called linear MDPs studied by Jin et al [24], Cai et al [14] and low-rank MDPs studied by Yang and Wang [44], which are both special cases of factored linear models [45,34].…”
Section: Assumption 1 (Realizable Function Approximation) For Any H ∈...mentioning
confidence: 99%
“…This allows us to define the S × d feature matrix Φ with its x th row being ϕ T (x), and represent the action-value function as Q h,a = Φθ h,a . We make the following assumption: Assumption 1 (Factored linear MDP [49,37,25]). For each action a and stage h, there exists a d × S matrix M h,a and a vector ρ a such that the transition matrix can be written as P h,a = ΦM h,a , and the reward function as r a = Φρ a .…”
Section: Linear Function Approximation In Mdpsmentioning
confidence: 99%