2019
DOI: 10.1007/s10994-019-05788-0
|View full text |Cite
|
Sign up to set email alerts
|

TD-regularized actor-critic methods

Abstract: Actor-critic methods can achieve incredible performance on difficult reinforcement learning problems, but they are also prone to instability. This is partly due to the interaction between the actor and the critic during learning, e.g., an inaccurate step taken by one of them might adversely affect the other and destabilize the learning. To avoid such issues, we propose to regularize the learning objective of the actor by penalizing the temporal difference (TD) error of the critic. This improves stability by av… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
22
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 33 publications
(24 citation statements)
references
References 15 publications
0
22
0
Order By: Relevance
“…Action-state based inference along with the ability to optimize value-functions and policies in separate, iterative steps has led to variational expectation-maximization in actor-critic algorithms [56]. Our experiments demonstrate that actor-critic algorithms in conjunction with VAE outperform the state-of-the-art methods based solely on soft-actor critic or deterministic value-functions in self-driving domain.…”
Section: Deep Reinforcement Learningmentioning
confidence: 89%
See 1 more Smart Citation
“…Action-state based inference along with the ability to optimize value-functions and policies in separate, iterative steps has led to variational expectation-maximization in actor-critic algorithms [56]. Our experiments demonstrate that actor-critic algorithms in conjunction with VAE outperform the state-of-the-art methods based solely on soft-actor critic or deterministic value-functions in self-driving domain.…”
Section: Deep Reinforcement Learningmentioning
confidence: 89%
“…If a random variable can be any real number with equal probability then it is highly unpredictable and has very high entropy [60]. A high entropy in policy encourages exploration, and assigns equal probabilities to actions that have same or nearly equal Q-values [56]. It ensures that exploration does not collapse into repeatedly selecting a particular action leading to inconsistency in the approximated Q-function by assigning a high probability to any one action out of the possible set of actions [42].…”
Section: Soft Actor-critic (Sac)mentioning
confidence: 99%
“…In the existing literature, the driving environment is usually Rayleigh distributed [ 8 ]. Actor-critic methods have achieved incredible performance on RL problems such as games, but they are prone to instability due to frequent interaction between the actor and critic during learning [ 7 ]. An inaccurate step taken at one stage might adversely affect the subsequent steps, destabilizing the learning.…”
Section: Literature Reviewmentioning
confidence: 99%
“…Deep reinforcement learning has been widely applied to various problems, predominantly in game playing [ 7 , 8 ]. Deep reinforcement learning has also been extensively applied to resource allocation and channel estimation problems in wireless communication, autonomous routing and self-healing in networking, localization and path-planning in unmanned air vehicles (UAV), smart-drones and underwater communications.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation