What if the World Were Different? Gradient-Based Exploration for New Optimal Policies

Silva, Rui; Melo, Francisco S.; Veloso, Manuela

doi:10.29007/6jsv

Cited by 4 publications

(9 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…where, for compactness, given a policy π and world P we define µ π P = µ 0 (I − γP π ) −1 . This matches the gradient step of the method proposed previously by Silva, Melo, and Veloso (2018). The gradient step of this method was introduced as an approximation.…”

Section: Fixed Policy Differentiationsupporting

confidence: 72%

“…Recently, we have seen a shift in this paradigm, with new approaches that allow the agent to explicitly reason at a "meta-level" about other possible configurations of the world-those configurations that are achievable indirectly, through changes of environmental features controllable only before planning time. Experimental evaluation of this new paradigm showed promising results on different planning scenarios modeled as Markov decision processes (Metelli, Mutti, and Restelli 2018;Silva, Melo, and Veloso 2018).…”

Section: Introductionmentioning

confidence: 99%

“…Their approach optimizes over the convex hull of that set of world configurations, searching for the configuration that maximizes the expected rewards. Silva, Melo, and Veloso (2018), on the other hand, model possible changes to the world through a generic parameterization of the transition probabilities, and assume a cost function that penalizes changes to the original world configuration. Their approach uses local information (gradient) to optimize over the given space of parameters, searching for the world configuration that maximizes the trade-off between the expected rewards and the costs associated with the changes to the world.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

A Theoretical and Algorithmic Analysis of Configurable MDPs

Silva¹,

Farina

Melo³

et al. 2021

ICAPS

Self Cite

View full text Add to dashboard Cite

This paper analyzes, from theoretical and algorithmic perspectives, a class of problems recently introduced in the literature of Markov decision processes—configurable Markov decision processes. In this new class of problems we jointly optimize the probability transition function and associated optimal policy, in order to improve the performance of a decision-making agent. We contribute a complexity analysis on the problem from a computational perspective, where we show that, in general, solving a configurable MDP is NP-Hard. We also discuss practical challenges often faced in solving this class of problems. Additionally, we formally derive a gradient-based approach that sheds some light on the correctness and limitations of existing methods. We conclude by demonstrating the application of different parameterizations of configurable MDPs in two scenarios, offering a discussion on advantages and drawbacks from modeling and algorithmic perspectives. Our contributions set the foundation for a better understanding of this recent problem, and the wider applicability of the underlying ideas to different planning problems.

show abstract

Section: Fixed Policy Differentiationsupporting

confidence: 72%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

A Theoretical and Algorithmic Analysis of Configurable MDPs

Silva¹,

Farina

Melo³

et al. 2021

ICAPS

Self Cite

View full text Add to dashboard Cite

show abstract

“…Instead, the possibility to strategically act on the environmental dynamics is studied in a limited number of works only. Some approaches belonging to the planning area [12,38], some are constrained to specific forms of environment configurability [8,9,34], and others based on the curriculum learning framework [4,7]. The goal of the dissertation [18] is to provide a uniform treatment of environment configurability in its diverse aspects.…”

Section: Configurable Environmentsmentioning

confidence: 99%

Configurable Environments in Reinforcement Learning: An Overview

Metelli

2022

Special Topics in Information Technology

View full text Add to dashboard Cite

Reinforcement Learning (RL) has emerged as an effective approach to address a variety of complex control tasks. In a typical RL problem, an agent interacts with the environment by perceiving observations and performing actions, with the ultimate goal of maximizing the cumulative reward. In the traditional formulation, the environment is assumed to be a fixed entity that cannot be externally controlled. However, there exist several real-world scenarios in which the environment offers the opportunity to configure some of its parameters, with diverse effects on the agent’s learning process. In this contribution, we provide an overview of the main aspects of environment configurability. We start by introducing the formalism of the Configurable Markov Decision Processes (Conf-MDPs) and we illustrate the solutions concepts. Then, we revise the algorithms for solving the learning problem in Conf-MDPs. Finally, we present two applications of Conf-MDPs: policy space identification and control frequency adaptation.

show abstract

“…The knowledge of the agent's policy space could be of crucial importance when the learning process involves the presence of an external supervisor. Recently, the notion of Configurable Markov Decision Process (Conf-MDP, Metelli, Mutti, and Restelli 2018) has been introduced to account for the real-world scenarios in which it is possible to exercise a, maybe partial, control over the environment, by means of a set of environmental parameters (e.g., Silva, Melo, and Veloso 2018;Silva et al 2019). This activity, called environment configuration, can be carried out by the agent itself or by an external supervisor.…”

Section: Introductionmentioning

confidence: 99%

Policy Space Identification in Configurable Environments

Metelli¹,

Manneschi²,

Restelli³

2019

Preprint

View full text Add to dashboard Cite

We study the problem of identifying the policy space of a learning agent, having access to a set of demonstrations generated by its optimal policy. We introduce an approach based on statistical testing to identify the set of policy parameters the agent can control, within a larger parametric policy space. After presenting two identification rules (combinatorial and simplified), applicable under different assumptions on the policy space, we provide a probabilistic analysis of the simplified one in the case of linear policies belonging to the exponential family. To improve the performance of our identification rules, we frame the problem in the recently introduced framework of the Configurable Markov Decision Processes, exploiting the opportunity of configuring the environment to induce the agent revealing which parameters it can control. Finally, we provide an empirical evaluation, on both discrete and continuous domains, to prove the effectiveness of our identification rules.

show abstract

What if the World Were Different? Gradient-Based Exploration for New Optimal Policies

Cited by 4 publications

References 14 publications

A Theoretical and Algorithmic Analysis of Configurable MDPs

A Theoretical and Algorithmic Analysis of Configurable MDPs

Configurable Environments in Reinforcement Learning: An Overview

Policy Space Identification in Configurable Environments

Contact Info

Product

Resources

About