2017
DOI: 10.1109/tsg.2016.2517211
|View full text |Cite
|
Sign up to set email alerts
|

Residential Demand Response of Thermostatically Controlled Loads Using Batch Reinforcement Learning

Abstract: Driven by recent advances in batch Reinforcement Learning (RL), this paper contributes to the application of batch RL to demand response. In contrast to conventional modelbased approaches, batch RL techniques do not require a system identification step, making them more suitable for a large-scale implementation. This paper extends fitted Q-iteration, a standard batch RL technique, to the situation when a forecast of the exogenous data is provided. In general, batch RL techniques do not rely on expert knowledge… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

1
142
0
1

Year Published

2017
2017
2022
2022

Publication Types

Select...
3
3
2

Relationship

1
7

Authors

Journals

citations
Cited by 272 publications
(144 citation statements)
references
References 31 publications
1
142
0
1
Order By: Relevance
“…The usual roles of domain knowledge are: · Making the computations necessary for solving the problem more time-or space-efficient, · Guiding the solution process, ( (2009)). Work presented in Ruelens et al (2016) extended the fitted Q iteration algorithm in order to take advantage of domain-specific knowledge (in particular case a forecast of the exogenous data is provided to design demand response control). Q(λ) with eligibility traces is used to take advantage of domain-specific knowledge in Yu et al (2011).…”
Section: Past and Recent Considerations Of Rl For Electric Power Systmentioning
confidence: 99%
See 1 more Smart Citation
“…The usual roles of domain knowledge are: · Making the computations necessary for solving the problem more time-or space-efficient, · Guiding the solution process, ( (2009)). Work presented in Ruelens et al (2016) extended the fitted Q iteration algorithm in order to take advantage of domain-specific knowledge (in particular case a forecast of the exogenous data is provided to design demand response control). Q(λ) with eligibility traces is used to take advantage of domain-specific knowledge in Yu et al (2011).…”
Section: Past and Recent Considerations Of Rl For Electric Power Systmentioning
confidence: 99%
“…Power system components considered include: dynamic brake Ernst et al (2004); Glavic (2005), thyristor controlled series capacitor Ernst et al (2004Ernst et al ( , 2009, quadrature booster Li and Wu (1999), synchronous generators (all AGC related references), individual or aggregated loads Vandael et al (2015); Ruelens et al (2016), etc. If used as a multi-agent system, then additional state variables must be introduced to ensure convergence of these essentially distributed computation schemes, and an adapted variant of standard RL methods is often used (for example correlated equilibrium Q(λ) Yu et al (2012a)).…”
Section: Past and Recent Considerations Of Rl For Electric Power Systmentioning
confidence: 99%
“…This results in a slow convergence rate of the Q-learning algorithm to an optimal policy [21]; more observations are needed to construct a control policy. In batch RL techniques (off-line RL) [22,23], a controller estimates a control policy based on a batch of its past experiences.…”
Section: Introductionmentioning
confidence: 99%
“…The ability of batch RL to reuse their past experiences makes them converge faster than online RL methods like Q-learning and SARSA. Batch RL has been used for demand response in [21,[24][25][26]. Vandael et al [27] used a batch RL technique to find a day-ahead consumption plan of a cluster of electric vehicles.…”
Section: Introductionmentioning
confidence: 99%
“…The behaviour of these representative devices allows to capture the behaviour of the entire set of the aggregated ones (e.g., [1], [2], [3], [4]). The second approach is model-free since it infers the behaviour of the distributed devices from the interaction between them and a central unit (i.e., the aggregator) (e.g., [5], [6]). Usually, these approaches adopt data-driven learning techniques.…”
Section: Introductionmentioning
confidence: 99%