A Sarsa-based adaptive controller for building energy conservation

Fu, Qiming; Hu, Lingyao; Wu, Hongjie; Hu, Fuyuan; Wen, Hao; Chen, Jianping

doi:10.3233/jcm-180792

Cited by 4 publications

(1 citation statement)

References 10 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For more efficient conservation of the building energy, RL has been applied to optimize heating, ventilation, and air conditioning parameters (Yu et al., 2021). The main RL algorithms applied in building energy control are tabular Q‐learning (S. Liu & Henze, 2006; Yang et al., 2015), deep Q‐network (Ahn & Park, 2020), deep deterministic policy gradient (DDPG; Du et al., 2021), advantage actor critic (Morinibu et al., 2019), asynchronous advantage actor‐critic (Z. Zhang et al., 2019), double deep Q‐learning, and state‐action‐reward‐state‐action (Fu et al., 2018).…”

Section: Introductionmentioning

confidence: 99%

Visual comfort generative design framework based on parametric network in underground space

Gui

Zhou

Xie

2022

Computer aided Civil Eng

View full text Add to dashboard Cite

With the growing demand for a high‐quality life, visual comfort (VC) is becoming increasingly important for improving the quality of underground spaces. The underground space landscape features can be defined by the spatial and material parameters of the components. This study proposes a novel parametric generative network (StepGN) for the 3D generative design of VC. It combines parametric modeling, VC evaluation, and a novel reinforcement learning (RL) model called encoded soft actor critic (ESAC) and simplifies the optimization of complex VC scenes into a parameter‐generating optimization process. Among them, parametric modeling is used to generate a 3D underground space scene with components and material properties through parameterization, and the optimization of parameters depends on the evaluation and RL model. The evaluation model provides reward value, and a new ESAC algorithm is developed. It combines the soft actor‐critic (SAC) algorithm with the encoder process and by setting the reward with a confidence threshold. In addition, the Swin cosine distance (SCD) is used to measure the diversity of the generated scenes. A comparison of the policy types and range conversion methods proves that the stochastic policy and Sigmoid function are more suitable for the generative design of VC. By comparing StepGN with a generative adversarial network‐based generative network (VCGN) and other RL processes, it shows that StepGN can generate discrete distributions of the VC levels and can realize a high‐comfort level scene, and the training speed and stability are considerably improved. Finally, StepGN is applied for the optimization of the Wujiaochang subway station scene in Shanghai, and it is proved that the VC of the generated results can provide a high comfort level.

show abstract