2020
DOI: 10.48550/arxiv.2006.01738
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Jointly Learning Environments and Control Policies with Projected Stochastic Gradient Ascent

Abstract: In this work, we generalize the direct policy search algorithms to an algorithm we call Direct Environment Search with (projected stochastic) Gradient Ascent (DESGA). The latter can be used to jointly learn a reinforcement learning (RL) environment and a policy with maximal expected return over a joint hypothesis space of environments and policies. We illustrate the performance of DESGA on two benchmarks. First, we consider a parametrized space of Mass-Spring-Damper (MSD) environments. Then, we use our algorit… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 7 publications
0
1
0
Order By: Relevance
“…Several approaches are being developed for energy management systems [6][7][8][9]27,28]. The latest one, based on deep reinforced learning, while producing results and being developed in open-sourced frameworks [29][30][31][32] is considered a black-box model [33] and the results are hard to explain, analyze and validate for domain experts. The approach developed in this paper can be considered as a proxy for Off-Policy reinforcement learning [34].…”
Section: Contributions To Noveltymentioning
confidence: 99%
“…Several approaches are being developed for energy management systems [6][7][8][9]27,28]. The latest one, based on deep reinforced learning, while producing results and being developed in open-sourced frameworks [29][30][31][32] is considered a black-box model [33] and the results are hard to explain, analyze and validate for domain experts. The approach developed in this paper can be considered as a proxy for Off-Policy reinforcement learning [34].…”
Section: Contributions To Noveltymentioning
confidence: 99%