DOI: 10.29007/6jsv
|View full text |Cite
|
Sign up to set email alerts
|

What if the World Were Different? Gradient-Based Exploration for New Optimal Policies

Abstract: Planning under uncertainty assumes a model of the world that specifies the probabilistic effects of the actions of an agent in terms of changes of the state. Given such model, planning proceeds to determine a policy that defines for each state the choice of action that the agent should follow in order to maximize a reward function. In this work, we realize that the world can be changed in more ways than those possible by the execution of the agent’s repertoire of actions. These additional configurations of the… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
8
0

Publication Types

Select...
2
1
1

Relationship

2
2

Authors

Journals

citations
Cited by 4 publications
(9 citation statements)
references
References 14 publications
1
8
0
Order By: Relevance
“…where, for compactness, given a policy π and world P we define µ π P = µ 0 (I − γP π ) −1 . This matches the gradient step of the method proposed previously by Silva, Melo, and Veloso (2018). The gradient step of this method was introduced as an approximation.…”
Section: Fixed Policy Differentiationsupporting
confidence: 72%
See 2 more Smart Citations
“…where, for compactness, given a policy π and world P we define µ π P = µ 0 (I − γP π ) −1 . This matches the gradient step of the method proposed previously by Silva, Melo, and Veloso (2018). The gradient step of this method was introduced as an approximation.…”
Section: Fixed Policy Differentiationsupporting
confidence: 72%
“…Recently, we have seen a shift in this paradigm, with new approaches that allow the agent to explicitly reason at a "meta-level" about other possible configurations of the world-those configurations that are achievable indirectly, through changes of environmental features controllable only before planning time. Experimental evaluation of this new paradigm showed promising results on different planning scenarios modeled as Markov decision processes (Metelli, Mutti, and Restelli 2018;Silva, Melo, and Veloso 2018).…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Instead, the possibility to strategically act on the environmental dynamics is studied in a limited number of works only. Some approaches belonging to the planning area [12,38], some are constrained to specific forms of environment configurability [8,9,34], and others based on the curriculum learning framework [4,7]. The goal of the dissertation [18] is to provide a uniform treatment of environment configurability in its diverse aspects.…”
Section: Configurable Environmentsmentioning
confidence: 99%
“…The knowledge of the agent's policy space could be of crucial importance when the learning process involves the presence of an external supervisor. Recently, the notion of Configurable Markov Decision Process (Conf-MDP, Metelli, Mutti, and Restelli 2018) has been introduced to account for the real-world scenarios in which it is possible to exercise a, maybe partial, control over the environment, by means of a set of environmental parameters (e.g., Silva, Melo, and Veloso 2018;Silva et al 2019). This activity, called environment configuration, can be carried out by the agent itself or by an external supervisor.…”
Section: Introductionmentioning
confidence: 99%