“…For more efficient conservation of the building energy, RL has been applied to optimize heating, ventilation, and air conditioning parameters (Yu et al., 2021). The main RL algorithms applied in building energy control are tabular Q‐learning (S. Liu & Henze, 2006; Yang et al., 2015), deep Q‐network (Ahn & Park, 2020), deep deterministic policy gradient (DDPG; Du et al., 2021), advantage actor critic (Morinibu et al., 2019), asynchronous advantage actor‐critic (Z. Zhang et al., 2019), double deep Q‐learning, and state‐action‐reward‐state‐action (Fu et al., 2018).…”