Batch reinforcement learning in a complex domain

Kalyanakrishnan, Shivaram; Stone, Peter

doi:10.1145/1329125.1329241

Cited by 53 publications

(40 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Examples of techniques to reduce the state space dimension have been used by Riedmiller et al (2009). In this work, the authors applied neural networks as function approximators together with fast learning algorithms (Kalyanakrishnan and Stone 2007).…”

Section: Automatic Design Methodsmentioning

confidence: 99%

Swarm robotics: a review from the swarm engineering perspective

et al. 2013

View full text Add to dashboard Cite

Swarm robotics is an approach to collective robotics that takes inspiration from the self-organized behaviors of social animals. Through simple rules and local interactions, swarm robotics aims at designing robust, scalable, and flexible collective behaviors for the coordination of large numbers of robots. In this paper, we analyze the literature from the point of view of swarm engineering: we focus mainly on ideas and concepts that contribute to the advancement of swarm robotics as an engineering field and that could be relevant to tackle real-world applications. Swarm engineering is an emerging discipline that aims at defining systematic and well founded procedures for modeling, designing, realizing, verifying, validating, operating, and maintaining a swarm robotics system. We propose two taxonomies: in the first taxonomy, we classify works that deal with design and analysis methods; in the second taxonomy, we classify works according to the collective behavior studied. We conclude with a discussion of the current limits of swarm robotics as an engineering discipline and with suggestions for future research directions.

show abstract

Section: Automatic Design Methodsmentioning

confidence: 99%

Swarm robotics: a review from the swarm engineering perspective

et al. 2013

View full text Add to dashboard Cite

show abstract

“…One possible approach to alleviate this problem is to store transition samples in a database and reuse them multiple times, similarly to how the batch algorithms of the previous section work. This procedure is known as experience replay (Lin, 1992;Kalyanakrishnan and Stone, 2007). Another option is to employ so-called eligibility traces, which allow the parameter updates at the current step to also incorporate information about recently observed transitions (e.g., Singh and Sutton, 1996).…”

Section: Online Model-free Approximate Value Iterationmentioning

confidence: 99%

Reinforcement Learning and Dynamic Programming Using Function Approximators

Buşoniu¹,

Babuška²,

Schutter³

et al. 2017

547

398

View full text Add to dashboard Cite

Control systems are making a tremendous impact on our society. Though invisible to most users, they are essential for the operation of nearly all devices -from basic home appliances to aircraft and nuclear power plants. Apart from technical systems, the principles of control are routinely applied and exploited in a variety of disciplines such as economics, medicine, social sciences, and artificial intelligence.A common denominator in the diverse applications of control is the need to influence or modify the behavior of dynamic systems to attain prespecified goals. One approach to achieve this is to assign a numerical performance index to each state trajectory of the system. The control problem is then solved by searching for a control policy that drives the system along trajectories corresponding to the best value of the performance index. This approach essentially reduces the problem of finding good control policies to the search for solutions of a mathematical optimization problem.Early work in the field of optimal control dates back to the 1940s with the pioneering research of Pontryagin and Bellman. Dynamic programming (DP), introduced by Bellman, is still among the state-of-the-art tools commonly used to solve optimal control problems when a system model is available. The alternative idea of finding a solution in the absence of a model was explored as early as the 1960s. In the 1980s, a revival of interest in this model-free paradigm led to the development of the field of reinforcement learning (RL). The central theme in RL research is the design of algorithms that learn control policies solely from the knowledge of transition samples or trajectories, which are collected beforehand or by online interaction with the system. Most approaches developed to tackle the RL problem are closely related to DP algorithms.A core obstacle in DP and RL is that solutions cannot be represented exactly for problems with large discrete state-action spaces or continuous spaces. Instead, compact representations relying on function approximators must be used. This challenge was already recognized while the first DP techniques were being developed. However, it has only been in recent years -and largely in correlation with the advance of RL -that approximation-based methods have grown in diversity, maturity, and efficiency, enabling RL and DP to scale up to realistic problems.This book provides an accessible in-depth treatment of reinforcement learning and dynamic programming methods using function approximators. We start with a concise introduction to classical DP and RL, in order to build the foundation for the remainder of the book. Next, we present an extensive review of state-of-the-art approaches to DP and RL with approximation. Theoretical guarantees are provided on the solutions obtained, and numerical examples and comparisons are used to illustrate the properties of the individual methods. The remaining three chapters are i ii dedicated to a detailed presentation of representative algorithms from the three major classes o...

show abstract

“…In the literature, this growing batch approach can be found in several different guises; the number of alternations between episodes of exploration and episodes of learning can be in the whole range of being as close to the pure batch approach as using only two iterations to recal-culating the policy after every few interactions-e.g. after finishing one episode in a shortest-path problem (Kalyanakrishnan and Stone, 2007;Lange and Riedmiller, 2010a). In practice, the growing batch approach is the modeling of choice when applying batch reinforcement learning algorithms to real systems.…”

Section: The Growing Batch Learning Problemmentioning

confidence: 99%

“…For example, the growing batch approach could be classified as an online method-it interacts with the system like an online method and incrementally improves its policy as new experience becomes available-as well as, from a data usage perspective, being seen as a batchalgorithm, since it stores all experience and uses 'batch methods' to learn from these observations. Although FQI-like KADP and LSPI-has been proposed by Ernst as a pure batch algorithm working on a fixed set of samples, it can easily be adapted to the growing batch setting, as, for example, shown by Kalyanakrishnan and Stone (2007). This holds true for every 'pure' batch approach.…”

Section: Identifying Batch Algorithmsmentioning

confidence: 99%

Batch Reinforcement Learning

Lange

Gabel

Riedmiller

2012

Adaptation, Learning, and Optimization

305

225

View full text Add to dashboard Cite

Batch reinforcement learning is a subfield of dynamic programming-based reinforcement learning. Originally defined as the task of learning the best possible policy from a fixed set of a priori-known transition samples, the (batch) algorithms developed in this field can be easily adapted to the classical online case, where the agent interacts with the environment while learning. Due to the efficient use of collected data and the stability of the learning process, this research area has attracted a lot of attention recently. In this chapter, we introduce the basic principles and the theory behind batch reinforcement learning, describe the most important algorithms, exemplarily discuss ongoing research within this field, and briefly survey real-world applications of batch reinforcement learning.

show abstract

Batch reinforcement learning in a complex domain

Cited by 53 publications

References 13 publications

Swarm robotics: a review from the swarm engineering perspective

Swarm robotics: a review from the swarm engineering perspective

Reinforcement Learning and Dynamic Programming Using Function Approximators

Batch Reinforcement Learning

Contact Info

Product

Resources

About