2019
DOI: 10.1016/j.enbuild.2019.07.029
|View full text |Cite
|
Sign up to set email alerts
|

Whole building energy model for HVAC optimal control: A practical framework based on deep reinforcement learning

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
68
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 233 publications
(98 citation statements)
references
References 39 publications
0
68
0
Order By: Relevance
“…Since there are many DRL methods in the [47]. For most of existing works on DRL for building energy systems, model-free methods have been used and can be further classified into several types as in [48] [63], Advantage Actor-Critic (A2C) [64], Asynchronous Advantage Actor-Critic (A3C) [65]), and maximum entropy methods (e.g., Multi-Actor Attention-Critic (MAAC) [17], Entropy-Based Collective Advantage Actor-Critic (EB-C-A2C) [27], Entropy-Based Collective Deep Q-Network (EB-C-DQN) [27]). In above-mentioned methods, Q-learning methods do not support continuous actions.…”
Section: Drl Classificationmentioning
confidence: 99%
See 4 more Smart Citations
“…Since there are many DRL methods in the [47]. For most of existing works on DRL for building energy systems, model-free methods have been used and can be further classified into several types as in [48] [63], Advantage Actor-Critic (A2C) [64], Asynchronous Advantage Actor-Critic (A3C) [65]), and maximum entropy methods (e.g., Multi-Actor Attention-Critic (MAAC) [17], Entropy-Based Collective Advantage Actor-Critic (EB-C-A2C) [27], Entropy-Based Collective Deep Q-Network (EB-C-DQN) [27]). In above-mentioned methods, Q-learning methods do not support continuous actions.…”
Section: Drl Classificationmentioning
confidence: 99%
“…In model-based methods, DRL agents need to learn building environment models based on historical data Long Short-Term Memory-Deep Deterministic Policy Gradients (LSTM-DDPG) [46], differentiable mode policy-Proximal Policy Optimization (differentiable MPC policy-PPO) [47]. [63], Advantage [64], Asynchronous Advantage Actor-Critic (A3C) [65]), and maximum entropy methods (e.g., Multi-Ac (MAAC) [17], Entropy-Based Collective Advantage Actor-Critic (EB-C-A2C) [27], Entropy-Based Collecti (EB-C-DQN) [27]). In above-mentioned methods, Q-learning methods do not support continuous actions.…”
Section: Applications Of Drl In a Single Buildingmentioning
confidence: 99%
See 3 more Smart Citations