The Primacy Bias in Deep Reinforcement Learning

Nikishin, Evgenii; Schwarzer, Max; D’Oro, Pierluca; Bacon, Pierre-Luc; Courville, Aaron

doi:10.48550/arxiv.2205.07802

Cited by 1 publication

(1 citation statement)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…As a result, an experience will be used too many times to update the neural networks, which could limit the learning performance to a certain extent [26,27]. In order to relieve the adverse effect of the primacy bias, the weight parameters of the neural networks will be reset regularly [28]. When the outputs of the Q networks become stable, which means the q-function converges, the reset operation will be performed.…”

Section: Training Methodsmentioning

confidence: 99%

Deep reinforcement learning based research on low‐carbon scheduling with distribution network schedulable resources

Chen

Liu

Guo

et al. 2023

IET Generation Trans & Dist

View full text Add to dashboard Cite

Reducing carbon emissions is a crucial way to achieve the goal of green and sustainable development. To accomplish this goal, electric vehicles (EVs) are considered systemschedulable energy storage devices, suppressing the negative impact of the randomness and fluctuation of renewable energy on the system's operation. In this paper, a coordination control strategy aimed at minimising the carbon emissions of a distribution network between EVs, energy storage devices, and static var compensators (SVCs) is proposed. A model-free deep reinforcement learning (DRL)-based approach is developed to learn the optimal control strategy with the constraint of avoiding system overload caused by random EV access. The twin-delayed deep deterministic policy gradient (TD3) framework is applied to design the learning method. After the model learning is completed, the neural network can quickly generate a real-time low-carbon scheduling strategy according to the system operating situation. Finally, simulation on the IEEE 33-bus system verifies the effectiveness and robustness of this method. On the premise of meeting the charging demand of electric vehicles, this method can optimise the system operation by controlling the charge-discharge process of EVs, effectively absorbing the renewable energy in the system and reducing the carbon emissions of the system operation.

show abstract

Section: Training Methodsmentioning

confidence: 99%