Learning Representation and Control in Markov Decision Processes: New Frontiers

Mahadevan, Sridhar

doi:10.1561/2200000003

Cited by 51 publications

(33 citation statements)

References 115 publications

(163 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…A directed graph was constructed from the samples where vertices correspond to state variables. The digraph construction followed along the lines discussed in (Mahadevan et al, 2006) with a slight difference for the continuous domain (discussed in detail below). In the discrete domain, directed edges were added for actual state transitions seen in the training episodes.…”

Section: Methodsmentioning

confidence: 99%

See 1 more Smart Citation

Constructing basis functions from directed graphs for value function approximation

Johns

Mahadevan

2007

Proceedings of the 24th International Conference on Machine Learning

Self Cite

View full text Add to dashboard Cite

Basis functions derived from an undirected graph connecting nearby samples from a Markov decision process (MDP) have proven useful for approximating value functions. The success of this technique is attributed to the smoothness of the basis functions with respect to the state space geometry. This paper explores the properties of bases created from directed graphs which are a more natural fit for expressing state connectivity. Digraphs capture the effect of non-reversible MDPs whose value functions may not be smooth across adjacent states. We provide an analysis using the Dirichlet sum of the directed graph Laplacian to show how the smoothness of the basis functions is affected by the graph's invariant distribution. Experiments in discrete and continuous MDPs with nonreversible actions demonstrate a significant improvement in the policies learned using directed graph bases.

show abstract

Section: Methodsmentioning

confidence: 99%

“…Both directed Laplacians are symmetric matrices, ensuring a complete orthonormal basis of real eigenvectors. The symmetrization 1 2 (ΨP + P T Ψ) essentially creates an undirected graph with edge weights Mahadevan et al (2006) proposed using three different symmetrization techniques:…”

Section: Directed Graph Laplacianmentioning

confidence: 99%

Constructing basis functions from directed graphs for value function approximation

Johns

Mahadevan

2007

Proceedings of the 24th International Conference on Machine Learning

Self Cite

View full text Add to dashboard Cite

show abstract

“…One class of methods aims at constructing a parsimonious set of features (basis functions). These include tuning the parameter of Gaussian RBF either using a gradient-or the cross-entropymethod in the context of LSTD (Menache et al, 2005), deriving new basis functions with nonparametric techniques (Keller et al, 2006;Parr et al, 2007) or using a combination of numerical analysis and nonparametric techniques (Mahadevan, 2009). These methods, however, do not attempt to control the tradeoff between the approximation and estimation errors.…”

Section: The Choice Of the Function Spacementioning

confidence: 99%

Algorithms for Reinforcement Learning

Szepesvári¹

2010

Synthesis Lectures on Artificial Intelligence and Machine Learn

604

406

View full text Add to dashboard Cite

“…Option goal states have been selected by a variety of methods, the most common relying on computing visit or reward statistics over individual states to identify useful subgoals (Digney 1996, McGovern and Barto 2001, Şimşek and Barto 2004, 2009). Graph-based methods (Mannor et al 2004, Menache et al 2002, Şimşek et al 2005) build a state-transition graph and use its properties (e.g., local graph cuts, Şimşek et al 2005) to identify option goals.…”

Section: Hierarchical Reinforcement Learningmentioning

confidence: 99%

Behavioral Hierarchy: Exploration and Representation

Barto

Konidaris

Vigorito

2013

Computational and Robotic Models of the Hierarchical Organization of Behavior

View full text Add to dashboard Cite

Behavioral modules are units of behavior providing reusable building blocks that can be composed sequentially and hierarchically to generate extensive ranges of behavior. Hierarchies of behavioral modules facilitate learning complex skills and planning at multiple levels of abstraction and enable agents to incrementally improve their competence for facing new challenges that arise over extended periods of time. This chapter focusses on two features of behavioral hierarchy that appear to be less well recognized: its influence on exploratory behavior and the opportunity it affords to reduce the representational challenges of planning and learning in large, complex domains. Four computational examples are described that use methods of hierarchical reinforcement learning to illustrate the influence of behavioral hierarchy on exploration and representation. Beyond illustrating these features, the examples provide support for the central role of behavioral hierarchy in development and learning for both artificial and natural agents.

show abstract

Learning Representation and Control in Markov Decision Processes: New Frontiers

Cited by 51 publications

References 115 publications

Constructing basis functions from directed graphs for value function approximation

Constructing basis functions from directed graphs for value function approximation

Algorithms for Reinforcement Learning

Behavioral Hierarchy: Exploration and Representation

Contact Info

Product

Resources

About