Learning the structure of Factored Markov Decision Processes in reinforcement learning problems

Degris, Thomas; Sigaud, Olivier; Wuillemin, Pierre-Henri

doi:10.1145/1143844.1143877

Cited by 66 publications

(54 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…TeXDYNA ideas are first inspired by Sutton's DYNA architecture [8], enriched and adapted to FMDPs by [6]. Second, as to the exit oriented options representation, some ideas come from the HEXQ [4] and VISA [10,5] frameworks, where the exit definition proposed in HEXQ is extended to include variable change and context in order to address more complex structures.…”

Section: Discussionmentioning

confidence: 99%

“…Structured Dynamic Programming (SDP) algorithms such as SVI [7] make profit of this structure to compute a policy compactly. Structured-DYNA (SDYNA) [6] is a general framework that adapts indirect RL of the DYNA family [8] to the FMDP framework. SPITI is a particular instance of SDYNA based on a decision trees induction process to learn the structure of the problem and on SVI to obtain an efficient policy.…”

Section: A Quick Index To the Backgroundmentioning

confidence: 99%

“…• action a ← current action a • variable change v ch ← value in the branch, value in the leaf 8 9 • context c ← variables of the branch that leads to the leaf l As to incrementality, [6] proposes to re-initialize the value function each time the reward function changes, mainly because the structure of the value function results from the reward function. In the incremental options learning case, the value function tree is reset each time the reward function changes in order to take into account every modification that changes the structure of the policy.…”

Section: Learning: Adding and Updating Optionsmentioning

confidence: 99%

“…As to hierarchical representations, HEXQ [4] and VISA [5] are two algorithms designed to solve this problem. Besides, for FMDPs, [6] have proposed SDYNA to solve Factored Reinforcement Learning (FRL) problems. Here, we propose TeXDYNA, an algorithm that combines the benefits of HRL and FRL, building an HRL-augmented version of SDYNA. The paper is organized as follows.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

TeXDYNA: Hierarchical Reinforcement Learning in Factored MDPs

Kozlova

Sigaud

Meyer

2010

From Animals to Animats 11

Self Cite

View full text Add to dashboard Cite

Abstract. Reinforcement learning is one of the main adaptive mechanisms that is both well documented in animal behaviour and giving rise to computational studies in animats and robots. In this paper, we present TeXDYNA, an algorithm designed to solve large reinforcement learning problems with unknown structure by integrating hierarchical abstraction techniques of Hierarchical Reinforcement Learning and factorization techniques of Factored Reinforcement Learning. We validate our approach on the LIGHT BOX problem.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: A Quick Index To the Backgroundmentioning

confidence: 99%

Section: Learning: Adding and Updating Optionsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

TeXDYNA: Hierarchical Reinforcement Learning in Factored MDPs

Kozlova

Sigaud

Meyer

2010

From Animals to Animats 11

Self Cite

View full text Add to dashboard Cite

show abstract

“…Starting from the variable that changes most frequently, a level of hierarchy is generated for each variable, and at each level the states that are represented by the values of the corresponding variable are partitioned into regions; the regions are identified by state-action pairs (called exits) that cause unpredictable transitions; separate policies that leave each region through its exists form the temporal abstractions. In the factored reinforcement learning setting, TexDYNA algorithm of Kozlova et al (2009) simultaneously decomposes a factored MDP into a set of options and improves incrementally the local policy of each option by using a particular decision tree based instance of the model-based reinforcement learning framework SDYNA (Degris et al 2006). In their approach, the options are determined by exits as in HEXQ, but the variable whose values determine the context itself are explicit in the exit definition; furthermore, to ensure their relevance exits are updated every time the model of transitions change.…”

Section: Related Workmentioning

confidence: 99%

Improving reinforcement learning by using sequence trees

2010

View full text Add to dashboard Cite

This paper proposes a novel approach to discover options in the form of stochastic conditionally terminating sequences; it shows how such sequences can be integrated into the reinforcement learning framework to improve the learning performance. The method utilizes stored histories of possible optimal policies and constructs a specialized tree structure during the learning process. The constructed tree facilitates the process of identifying frequently used action sequences together with states that are visited during the execution of such sequences. The tree is constantly updated and used to implicitly run corresponding options. The effectiveness of the method is demonstrated empirically by conducting extensive experiments on various domains with different properties.

show abstract

Approximate Dynamic Programming

Munos¹

2013

Markov Decision Processes in Artificial Intelligence

View full text Add to dashboard Cite

Learning the structure of Factored Markov Decision Processes in reinforcement learning problems

Cited by 66 publications

References 12 publications

TeXDYNA: Hierarchical Reinforcement Learning in Factored MDPs

TeXDYNA: Hierarchical Reinforcement Learning in Factored MDPs

Improving reinforcement learning by using sequence trees

Approximate Dynamic Programming

Contact Info

Product

Resources

About