2022
DOI: 10.1016/j.epsr.2022.108375
|View full text |Cite
|
Sign up to set email alerts
|

Batch reinforcement learning for network-safe demand response in unknown electric grids

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
1
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 9 publications
(3 citation statements)
references
References 22 publications
0
1
0
Order By: Relevance
“…Collectively, these techniques are categorized into four primary groups, as illustrated in Figure 1: artificial intelligence techniques, conventional methods, metaheuristic-based methods, and others [17]. Within the realm of artificial intelligence, notable algorithms include fuzzy logic [18], game theory [19], and various forms of reinforcement learning, such as Q-Learning [20], DQN [21], actor-critic methods [22], and TD3 [23]. These have been applied with considerable success to the challenge of energy management in microgrids.…”
Section: Literature Reviewmentioning
confidence: 99%
“…Collectively, these techniques are categorized into four primary groups, as illustrated in Figure 1: artificial intelligence techniques, conventional methods, metaheuristic-based methods, and others [17]. Within the realm of artificial intelligence, notable algorithms include fuzzy logic [18], game theory [19], and various forms of reinforcement learning, such as Q-Learning [20], DQN [21], actor-critic methods [22], and TD3 [23]. These have been applied with considerable success to the challenge of energy management in microgrids.…”
Section: Literature Reviewmentioning
confidence: 99%
“…However, such approaches may require ad hoc tuning of the constraint violation reward and may result in unsafe decisions during the exploration phase. In the second category, the safety of the decisions is promoted by offline (batch) learning to initialize the exploration [16] or by the transfer of expert knowledge learned offline to guide the exploration [17]- [19]. Despite significant improvements, these approaches cannot provide safety guarantees and are not suitable for fully online learning.…”
Section: Introductionmentioning
confidence: 99%
“…However, such approaches may require adhoc tuning of the constraint violation reward and may result in unsafe decisions during the exploration phase. In the second category, safety of the decisions is promoted by offline (batch) learning to initialize the exploration [12], or by the transfer of expert knowledge learned offline to guide the exploration [13]- [15]. Despite significant improvements, these approaches cannot provide theoretical safety guarantees, and are not suitable for fully online learning.…”
Section: Introductionmentioning
confidence: 99%