Proceedings of the 23rd International Conference on Machine Learning - ICML '06 2006
DOI: 10.1145/1143844.1143877
|View full text |Cite
|
Sign up to set email alerts
|

Learning the structure of Factored Markov Decision Processes in reinforcement learning problems

Abstract: Recent decision-theoric planning algorithms are able to find optimal solutions in large problems, using Factored Markov Decision Processes (fmdps). However, these algorithms need a perfect knowledge of the structure of the problem. In this paper, we propose sdyna, a general framework for addressing large reinforcement learning problems by trial-and-error and with no initial knowledge of their structure. sdyna integrates incremental planning algorithms based on fmdps with supervised learning techniques building… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
54
0

Year Published

2010
2010
2023
2023

Publication Types

Select...
3
3
1

Relationship

1
6

Authors

Journals

citations
Cited by 66 publications
(54 citation statements)
references
References 12 publications
0
54
0
Order By: Relevance
“…TeXDYNA ideas are first inspired by Sutton's DYNA architecture [8], enriched and adapted to FMDPs by [6]. Second, as to the exit oriented options representation, some ideas come from the HEXQ [4] and VISA [10,5] frameworks, where the exit definition proposed in HEXQ is extended to include variable change and context in order to address more complex structures.…”
Section: Discussionmentioning
confidence: 99%
See 3 more Smart Citations
“…TeXDYNA ideas are first inspired by Sutton's DYNA architecture [8], enriched and adapted to FMDPs by [6]. Second, as to the exit oriented options representation, some ideas come from the HEXQ [4] and VISA [10,5] frameworks, where the exit definition proposed in HEXQ is extended to include variable change and context in order to address more complex structures.…”
Section: Discussionmentioning
confidence: 99%
“…Structured Dynamic Programming (SDP) algorithms such as SVI [7] make profit of this structure to compute a policy compactly. Structured-DYNA (SDYNA) [6] is a general framework that adapts indirect RL of the DYNA family [8] to the FMDP framework. SPITI is a particular instance of SDYNA based on a decision trees induction process to learn the structure of the problem and on SVI to obtain an efficient policy.…”
Section: A Quick Index To the Backgroundmentioning
confidence: 99%
See 2 more Smart Citations
“…Starting from the variable that changes most frequently, a level of hierarchy is generated for each variable, and at each level the states that are represented by the values of the corresponding variable are partitioned into regions; the regions are identified by state-action pairs (called exits) that cause unpredictable transitions; separate policies that leave each region through its exists form the temporal abstractions. In the factored reinforcement learning setting, TexDYNA algorithm of Kozlova et al (2009) simultaneously decomposes a factored MDP into a set of options and improves incrementally the local policy of each option by using a particular decision tree based instance of the model-based reinforcement learning framework SDYNA (Degris et al 2006). In their approach, the options are determined by exits as in HEXQ, but the variable whose values determine the context itself are explicit in the exit definition; furthermore, to ensure their relevance exits are updated every time the model of transitions change.…”
Section: Related Workmentioning
confidence: 99%