2012
DOI: 10.1109/tsmcc.2011.2106494
|View full text |Cite
|
Sign up to set email alerts
|

Experience Replay for Real-Time Reinforcement Learning Control

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

1
99
0

Year Published

2013
2013
2023
2023

Publication Types

Select...
6
2
1

Relationship

0
9

Authors

Journals

citations
Cited by 200 publications
(100 citation statements)
references
References 20 publications
1
99
0
Order By: Relevance
“…This enhanced its generalization capability, by increasing the diversity of training data. Nevertheless, current GPS schemes can only train policies with a batch mode for different tasks, and are known to struggle with challenges of incremental data processing, particularly in robotic applications [12], [13], [14]. Specifically, GPS methods will not work if all training tasks are presented sequentially, and not collectively made available during the early training period.…”
Section: Introductionmentioning
confidence: 99%
“…This enhanced its generalization capability, by increasing the diversity of training data. Nevertheless, current GPS schemes can only train policies with a batch mode for different tasks, and are known to struggle with challenges of incremental data processing, particularly in robotic applications [12], [13], [14]. Specifically, GPS methods will not work if all training tasks are presented sequentially, and not collectively made available during the early training period.…”
Section: Introductionmentioning
confidence: 99%
“…In the Q function above θ represents the parameters (weights) of the neural net, which are updated after each decision (selected action). Furthermore, training a DRL agent requires a dataset of experiences D = {e1, ...eN } (also called 'experience replay memory' [23,24]) collected during online learning, where every experience is described as a tuple et = (st, at, rt, st+1). Inducing the Q function consists in applying Q-learning updates over minibatches of experience M B = {(s, a, r, s ) ∼ U (D)} drawn uniformly at random from the full dataset D. A Q-learning update at iteration i is thus defined according to the loss function…”
Section: Introductionmentioning
confidence: 99%
“…Reinforcement learning is a unsupervised learning approach, and is widely used in real time controlling [9]. In reinforcement learning, the controller communicates with external environment according to trial-and-error.…”
Section: Introductionmentioning
confidence: 99%