Proceedings of the 1st International Workshop on Reinforcement Learning for Energy Management in Buildings &Amp; Cities 2020
DOI: 10.1145/3427773.3427863
|View full text |Cite
|
Sign up to set email alerts
|

Augmenting Reinforcement Learning with a Planning Model for Optimizing Energy Demand Response

Abstract: While reinforcement learning (RL) on humans has shown incredible promise, it often suffers from a scarcity of data and few steps. In instances like these, a planning model of human behavior may greatly help. We present an experimental setup for the development and testing of an Soft Actor Critic (SAC) V2 RL architecture for several different neural architectures for planning models: an autoML optimized LSTM, an OLS, and a baseline model. We present the effects of including a planning model in agent learning wi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2

Citation Types

0
7
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
2

Relationship

3
3

Authors

Journals

citations
Cited by 9 publications
(7 citation statements)
references
References 7 publications
0
7
0
Order By: Relevance
“…A variant of these methods, Proximal Policy Optimization (PPO) algorithms ( [19]), optimizes a surrogate loss (1) which enables multiple gradient updates to be done on these actors using the same samples. However, even with PPO optimization, several months worth of real-world training data would have to be collected to fully train an hourly price-setting controller [21] in our Social Game. We seek to leverage a detailed simulation with behaviorally reasonable dynamics encoded in a model that can train on both simulated and experimental environments.…”
Section: Background 21 Reinforcement Learningmentioning
confidence: 99%
See 1 more Smart Citation
“…A variant of these methods, Proximal Policy Optimization (PPO) algorithms ( [19]), optimizes a surrogate loss (1) which enables multiple gradient updates to be done on these actors using the same samples. However, even with PPO optimization, several months worth of real-world training data would have to be collected to fully train an hourly price-setting controller [21] in our Social Game. We seek to leverage a detailed simulation with behaviorally reasonable dynamics encoded in a model that can train on both simulated and experimental environments.…”
Section: Background 21 Reinforcement Learningmentioning
confidence: 99%
“…Through this framework, a first-of-its-kind experiment has been proposed to implement behavioral demand response within an office building [22]. Prior work has proposed to describe an hourly price-setting controller that learns how to optimize its prices [21]. However, given the costliness of iterations in this experiment and the work that has been put into building a complex simulation environment, warm-starting the experiment's controller with learning from the simulation could prove valuable to its success.…”
Section: Introductionmentioning
confidence: 99%
“…With the online "vanilla" SAC optimization procedure, several decades worth of real-world training data would have to be collected to fully train an hourly price-setting controller [23] in our Social Game. We seek to leverage a detailed simulation with behaviorally reasonable dynamics encoded in a model that can train on both simulated and experimental environments to accelerate this process.…”
Section: Offline-online Reinforcement Learningmentioning
confidence: 99%
“…Through this framework, a first-of-its-kind experiment has been proposed to implement behavioral demand response within an office building [24]. Prior work has proposed to describe an hourly price-setting controller that learns how to optimize its prices [23] to maximize efficient energy usage by workers. However, the use of an AI price-setting controller gives rise to a tradeoff between energy cost and data cost.…”
Section: Introductionmentioning
confidence: 99%
“…Through this framework, a first-of-its-kind experiment has been proposed to implement behavioral demand response within an office building [21]. Prior work has proposed to describe an hourly price-setting controller that learns how to optimize its prices [20].…”
mentioning
confidence: 99%