Proceedings of the 6th International Joint Conference on Autonomous Agents and Multiagent Systems 2007
DOI: 10.1145/1329125.1329241
|View full text |Cite
|
Sign up to set email alerts
|

Batch reinforcement learning in a complex domain

Abstract: Temporal difference reinforcement learning algorithms are perfectly suited to autonomous agents because they learn directly from an agent's experience based on sequential actions in the environment. However, their most common algorithmic variants are relatively inefficient in their use of experience data, which in many agent-based settings can be scarce. In particular, they make just one learning "update" for each atomic experience. Batch reinforcement learning algorithms, on the other hand, aim to achieve gre… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
39
0
1

Year Published

2011
2011
2024
2024

Publication Types

Select...
5
2
2

Relationship

0
9

Authors

Journals

citations
Cited by 53 publications
(40 citation statements)
references
References 13 publications
0
39
0
1
Order By: Relevance
“…Examples of techniques to reduce the state space dimension have been used by Riedmiller et al (2009). In this work, the authors applied neural networks as function approximators together with fast learning algorithms (Kalyanakrishnan and Stone 2007).…”
Section: Automatic Design Methodsmentioning
confidence: 99%
“…Examples of techniques to reduce the state space dimension have been used by Riedmiller et al (2009). In this work, the authors applied neural networks as function approximators together with fast learning algorithms (Kalyanakrishnan and Stone 2007).…”
Section: Automatic Design Methodsmentioning
confidence: 99%
“…One possible approach to alleviate this problem is to store transition samples in a database and reuse them multiple times, similarly to how the batch algorithms of the previous section work. This procedure is known as experience replay (Lin, 1992;Kalyanakrishnan and Stone, 2007). Another option is to employ so-called eligibility traces, which allow the parameter updates at the current step to also incorporate information about recently observed transitions (e.g., Singh and Sutton, 1996).…”
Section: Online Model-free Approximate Value Iterationmentioning
confidence: 99%
“…In the literature, this growing batch approach can be found in several different guises; the number of alternations between episodes of exploration and episodes of learning can be in the whole range of being as close to the pure batch approach as using only two iterations to recal-culating the policy after every few interactions-e.g. after finishing one episode in a shortest-path problem (Kalyanakrishnan and Stone, 2007;Lange and Riedmiller, 2010a). In practice, the growing batch approach is the modeling of choice when applying batch reinforcement learning algorithms to real systems.…”
Section: The Growing Batch Learning Problemmentioning
confidence: 99%
“…For example, the growing batch approach could be classified as an online method-it interacts with the system like an online method and incrementally improves its policy as new experience becomes available-as well as, from a data usage perspective, being seen as a batchalgorithm, since it stores all experience and uses 'batch methods' to learn from these observations. Although FQI-like KADP and LSPI-has been proposed by Ernst as a pure batch algorithm working on a fixed set of samples, it can easily be adapted to the growing batch setting, as, for example, shown by Kalyanakrishnan and Stone (2007). This holds true for every 'pure' batch approach.…”
Section: Identifying Batch Algorithmsmentioning
confidence: 99%