Experience Replay for Real-Time Reinforcement Learning Control

Adam, Sébastien; Buşoniu, Lucian; Babuška, Robert

doi:10.1109/tsmcc.2011.2106494

Cited by 200 publications

(100 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This enhanced its generalization capability, by increasing the diversity of training data. Nevertheless, current GPS schemes can only train policies with a batch mode for different tasks, and are known to struggle with challenges of incremental data processing, particularly in robotic applications [12], [13], [14]. Specifically, GPS methods will not work if all training tasks are presented sequentially, and not collectively made available during the early training period.…”

Section: Introductionmentioning

confidence: 99%

Guided Policy Search for Sequential Multitask Learning

Xiong

Sun

Yang

et al. 2019

IEEE Trans. Syst. Man Cybern, Syst.

View full text Add to dashboard Cite

Abstract-Policy search in reinforcement learning (RL) is a practical approach to interact directly with environments in parameter spaces, that often deal with dilemmas of local optima and real-time sample collection. A promising algorithm, known as guided policy search (GPS), is capable of handling the challenge of training samples using trajectory-centric methods. It can also provide asymptotic local convergence guarantees. However, in its current form, the GPS algorithm cannot operate in sequential multi-task learning scenarios. This is due to its batch-style training requirement, where all training samples are collectively provided at the start of the learning process. The algorithm's adaptation is thus hindered for real-time applications, where training samples or tasks can arrive randomly. In this paper, the GPS approach is reformulated, by adapting a recently proposed, lifelong-learning method, elastic weight consolidation (EWC). Specifically, Fisher information is incorporated to impart knowledge from previously learned tasks. The proposed algorithm, termed sequential multi-task, learning-guided policy search (SMT-GPS), is able to operate in sequential multi-task learning settings, ensuring continuous policy learning, without catastrophic forgetting. Pendulum and robotic manipulation experiments demonstrate the new algorithms efficacy to learn control policies for handling sequentially-arriving training samples, delivering comparable performance to the traditional, batch-based GPS algorithm. In conclusion, the proposed algorithm is posited as a new benchmark for the real-time RL and robotics research community.Index Terms-Reinforcement learning, guided policy search, sequential multi-task learning, elastic weight consolidation.

show abstract

Section: Introductionmentioning

confidence: 99%

Guided Policy Search for Sequential Multitask Learning

Xiong

Sun

Yang

et al. 2019

IEEE Trans. Syst. Man Cybern, Syst.

View full text Add to dashboard Cite

show abstract

“…In the Q function above θ represents the parameters (weights) of the neural net, which are updated after each decision (selected action). Furthermore, training a DRL agent requires a dataset of experiences D = {e1, ...eN } (also called 'experience replay memory' [23,24]) collected during online learning, where every experience is described as a tuple et = (st, at, rt, st+1). Inducing the Q function consists in applying Q-learning updates over minibatches of experience M B = {(s, a, r, s ) ∼ U (D)} drawn uniformly at random from the full dataset D. A Q-learning update at iteration i is thus defined according to the loss function…”

Section: Introductionmentioning

confidence: 99%

Deep Reinforcement Learning of Dialogue Policies with Less Weight Updates

Cuayáhuitl¹,

Yu²

2017

Interspeech 2017

View full text Add to dashboard Cite

Deep reinforcement learning dialogue systems are attractive because they can jointly learn their feature representations and policies without manual feature engineering. But its application is challenging due to slow learning. We propose a two-stage method for accelerating the induction of single or multi-domain dialogue policies. While the first stage reduces the amount of weight updates over time, the second stage uses very limited minibatches (of as much as two learning experiences) sampled from experience replay memories. The former frequently updates the weights of the neural nets at early stages of training, and decreases the amount of updates as training progresses by performing updates during exploration and by skipping updates during exploitation. The learning process is thus accelerated through less weight updates in both stages. An empirical evaluation in three domains (restaurants, hotels and tv guide) confirms that the proposed method trains policies 5 times faster than a baseline without the proposed method. Our findings are useful for training larger-scale neural-based spoken dialogue systems.

show abstract

“…Reinforcement learning is a unsupervised learning approach, and is widely used in real time controlling [9]. In reinforcement learning, the controller communicates with external environment according to trial-and-error.…”

Section: Introductionmentioning

confidence: 99%

Hierarchical Reinforcement Learning Based Self-balancing Algorithm for Two-wheeled Robots

Yan¹,

Yang²

2016

TOEEJ

View full text Add to dashboard Cite

Self-balancing control is the basis for applications of two-wheeled robots. In order to improve the self-balancing of twowheeled robots, we propose a hierarchical reinforcement learning algorithm for controlling the balance of two-wheeled robots. After describing the subgoals of hierarchical reinforcement learning, we extract features for subgoals, define a feature value vector and its corresponding weight vector, and propose a reward function with additional subgoal reward function. Finally, we give a hierarchical reinforcement learning algorithm for finding the optimal strategy. Simulation experiments show that, the proposed algorithm is more effectiveness than traditional reinforcement learning algorithm in convergent speed. So in our system, the robots can get selfbalanced very quickly.

show abstract

Experience Replay for Real-Time Reinforcement Learning Control

Cited by 200 publications

References 20 publications

Guided Policy Search for Sequential Multitask Learning

Guided Policy Search for Sequential Multitask Learning

Deep Reinforcement Learning of Dialogue Policies with Less Weight Updates

Hierarchical Reinforcement Learning Based Self-balancing Algorithm for Two-wheeled Robots

Contact Info

Product

Resources

About