The increasing share of renewable energy sources in the electricity grid results in a higher degree of uncertainty regarding electrical energy production. In response to this, flexibility of the demand has been proposed as part of the solution. An important source of flexibility available at the residential consumer side are thermostatically controlled loads (TCLs). In this paper the activation of this source of flexibility is achieved by applying batch reinforcement learning (BRL) to an electric water heater (EWH) in a Time of Use (ToU) setting. The cost performance of six BRL agents with six different state spaces is compared quantitatively. In every case, the BRL agent can successfully shift energy consumption within 20-25 days. The performance of an agent with access to multiple temperature sensors along the height of the EWH is comparable to the performance of an agent with access to only the highest temperature sensor. This indicates manufacturing costs related to sensors can be reduced while maintaining the same performance. Additionally, results show that the inclusion of a theoretical state of charge value in the state space increases performance by more than 8% compared to the performance of the other BRL agents. It is therefore argued that an estimation of the state of charge should be included in future work as it would increase cost performance.
With the increasing share of renewable energy sources in the electricity grid comes the need to exploit the available flexibility at the demand side. Demand response programs seek to exploit the flexibility of consumers by motivating endusers to shift demand based on grid signals. An important source of flexibility available at residential consumers are Thermostatically Controlled Loads (TCLs). Additionally, recent advances within the reinforcement learning area have made it possible to apply this technique to a large range of problems.Driven by these promising examples, a Batch Reinforcement Learning (BRL) algorithm is applied to a TCL. An important property of the complex optimization problem studied here, is its partial observability. The main contribution of this paper is the application of BRL to a detailed building and heating system model, implemented in Modelica. A detailed TCL model allows to perform an in-depth analysis of the effects of partial observability on the performance of the chosen control strategy.Ultimately, this paper illustrates that Modelica can be used to provide a detailed environment for a BRL algorithm. At the end, the learned control policy has been compared with two other control policies. The obtained policy outperforms both and is economically feasible after a limited amount of training days.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.