TD-regularized actor-critic methods

Parisi, Simone; Tangkaratt, Voot; Peters, Jan; Khan, Mohammad Emtiyaz

doi:10.1007/s10994-019-05788-0

Cited by 33 publications

(24 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Action-state based inference along with the ability to optimize value-functions and policies in separate, iterative steps has led to variational expectation-maximization in actor-critic algorithms [56]. Our experiments demonstrate that actor-critic algorithms in conjunction with VAE outperform the state-of-the-art methods based solely on soft-actor critic or deterministic value-functions in self-driving domain.…”

Section: Deep Reinforcement Learningmentioning

confidence: 89%

“…If a random variable can be any real number with equal probability then it is highly unpredictable and has very high entropy [60]. A high entropy in policy encourages exploration, and assigns equal probabilities to actions that have same or nearly equal Q-values [56]. It ensures that exploration does not collapse into repeatedly selecting a particular action leading to inconsistency in the approximated Q-function by assigning a high probability to any one action out of the possible set of actions [42].…”

Section: Soft Actor-critic (Sac)mentioning

confidence: 99%

See 1 more Smart Citation

Safe Driving Of Autonomous Vehicles Through Improved Deep Reinforcement Learning

Gupta¹

2021

Preprint

View full text Add to dashboard Cite

In this thesis, we propose an environment perception framework for autonomous driving using deep reinforcement learning (DRL) that exhibits learning in autonomous vehicles under complex interactions with the environment, without being explicitly trained on driving datasets. Unlike existing techniques, our proposed technique takes the learning loss into account under deterministic as well as stochastic policy gradient. We apply DRL to object detection and safe navigation while enhancing a self-driving vehicle’s ability to discern meaningful information from surrounding data. For efficient environmental perception and object detection, various Q-learning based methods have been proposed in the literature. Unlike other works, this thesis proposes a collaborative deterministic as well as stochastic policy gradient based on DRL. Our technique is a combination of variational autoencoder (VAE), deep deterministic policy gradient (DDPG), and soft actor-critic (SAC) that adequately trains a self-driving vehicle. In this work, we focus on uninterrupted and reasonably safe autonomous driving without colliding with an obstacle or steering off the track. We propose a collaborative framework that utilizes best features of VAE, DDPG, and SAC and models autonomous driving as partly stochastic and partly deterministic policy gradient problem in continuous action space, and continuous state space. To ensure that the vehicle traverses the road over a considerable period of time, we employ a reward-penalty based system where a higher negative penalty is associated with an unfavourable action and a comparatively lower positive reward is awarded for favourable actions. We also examine the variations in policy loss, value loss, reward function, and cumulative reward for ‘VAE+DDPG’ and ‘VAE+SAC’ over the learning process.

show abstract

Section: Deep Reinforcement Learningmentioning

confidence: 89%

Section: Soft Actor-critic (Sac)mentioning

confidence: 99%

Safe Driving Of Autonomous Vehicles Through Improved Deep Reinforcement Learning

Gupta¹

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…In the existing literature, the driving environment is usually Rayleigh distributed [ 8 ]. Actor-critic methods have achieved incredible performance on RL problems such as games, but they are prone to instability due to frequent interaction between the actor and critic during learning [ 7 ]. An inaccurate step taken at one stage might adversely affect the subsequent steps, destabilizing the learning.…”

Section: Literature Reviewmentioning

confidence: 99%

“…Deep reinforcement learning has been widely applied to various problems, predominantly in game playing [ 7 , 8 ]. Deep reinforcement learning has also been extensively applied to resource allocation and channel estimation problems in wireless communication, autonomous routing and self-healing in networking, localization and path-planning in unmanned air vehicles (UAV), smart-drones and underwater communications.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Policy-Gradient and Actor-Critic Based State Representation Learning for Safe Driving of Autonomous Vehicles

Gupta

Khwaja

Anpalagan

et al. 2020

Sensors

View full text Add to dashboard Cite

In this paper, we propose an environment perception framework for autonomous driving using state representation learning (SRL). Unlike existing Q-learning based methods for efficient environment perception and object detection, our proposed method takes the learning loss into account under deterministic as well as stochastic policy gradient. Through a combination of variational autoencoder (VAE), deep deterministic policy gradient (DDPG), and soft actor-critic (SAC), we focus on uninterrupted and reasonably safe autonomous driving without steering off the track for a considerable driving distance. Our proposed technique exhibits learning in autonomous vehicles under complex interactions with the environment, without being explicitly trained on driving datasets. To ensure the effectiveness of the scheme over a sustained period of time, we employ a reward-penalty based system where a negative reward is associated with an unfavourable action and a positive reward is awarded for favourable actions. The results obtained through simulations on DonKey simulator show the effectiveness of our proposed method by examining the variations in policy loss, value loss, reward function, and cumulative reward for `VAE+DDPG’ and `VAE+SAC’ over the learning process.

show abstract

Reinforcement Learning

Du¹,

Swamy²

2019

Neural Networks and Statistical Learning

View full text Add to dashboard Cite

TD-regularized actor-critic methods

Cited by 33 publications

References 15 publications

Safe Driving Of Autonomous Vehicles Through Improved Deep Reinforcement Learning

Safe Driving Of Autonomous Vehicles Through Improved Deep Reinforcement Learning

Policy-Gradient and Actor-Critic Based State Representation Learning for Safe Driving of Autonomous Vehicles

Reinforcement Learning

Contact Info

Product

Resources

About