The platform will undergo maintenance on Sep 14 at about 7:45 AM EST and will be unavailable for approximately 2 hours.
2021
DOI: 10.1007/s10994-021-06033-3
|View full text |Cite
|
Sign up to set email alerts
|

Policy space identification in configurable environments

Abstract: We study the problem of identifying the policy space available to an agent in a learning process, having access to a set of demonstrations generated by the agent playing the optimal policy in the considered space. We introduce an approach based on frequentist statistical testing to identify the set of policy parameters that the agent can control, within a larger parametric policy space. After presenting two identification rules (combinatorial and simplified), applicable under different assumptions on the polic… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
26
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
2

Relationship

1
6

Authors

Journals

citations
Cited by 14 publications
(31 citation statements)
references
References 20 publications
0
26
0
Order By: Relevance
“…We also demonstrate in our experiments that our sample dropout technique can boost the sample efficiency of GAE-based policy optimization algorithms. [28] and [29] propose an actor-only policy optimization algorithm that alternates online and offline optimization via important sampling. To capture the uncertainty induced by importance sampling, they propose a surrogate objective function derived from a statistical bound on the estimated performance, which helps bound the variance of surrogate objective in terms of the Renyi divergence.…”
Section: Variance Reduction In Policy Gradientmentioning
confidence: 99%
“…We also demonstrate in our experiments that our sample dropout technique can boost the sample efficiency of GAE-based policy optimization algorithms. [28] and [29] propose an actor-only policy optimization algorithm that alternates online and offline optimization via important sampling. To capture the uncertainty induced by importance sampling, they propose a surrogate objective function derived from a statistical bound on the estimated performance, which helps bound the variance of surrogate objective in terms of the Renyi divergence.…”
Section: Variance Reduction In Policy Gradientmentioning
confidence: 99%
“…Conf-MDPs have been introduced in [7] for finite spaces, and extended in [9] for more complex continuous environments. In these seminal works, the agent is fully responsible for the configuration activity of the environment, which, in turn, results in an auxiliary task to optimize performance.…”
Section: Related Workmentioning
confidence: 99%
“…Indeed, in Conf-MDPs, the agent is not interested in learning and gathering experience samples in sub-optimal configuration; its interest is solely toward the optimal policy in the optimal environmental configuration. The configuration activity within the environment, as shown in more recent works [8], [11], can also be carried out by an external entity (i.e., configurator) whose goals can even be adversary w.r.t. the ones of the agent [11].…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations