Policy Search in a Space of Simple Closed-form Formulas: Towards Interpretability of Reinforcement Learning

Maes, Francis; Fonteneau, Raphaël; Wehenkel, Louis; Ernst, Damien

doi:10.1007/978-3-642-33492-4_6

Cited by 23 publications

(17 citation statements)

References 22 publications

(19 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Note that such exact simulations are usually not available in industry. Similarly, in (Maes et al, 2012) Monte Carlo simulations have been drawn in order to identify the best policies. However, the policy search itself has been performed by formalizing a search over a space of simple closed-form formulas as a multi-armed bandit problem.…”

Section: Related Workmentioning

confidence: 99%

“…This work introduces a genetic programming (GP) approach for autonomously learning interpretable reinforcement learning (RL) policies from previously recorded state transitions. Despite the search of interpretable RL policies being of high academic and industrial interest, little has been published concerning human interpretable and understandable policies trained by data driven learning methods (Maes, Fonteneau, Wehenkel, and Ernst, 2012). Recent research results show that using fuzzy rules in batch RL settings can be considered an adequate solution to this task (Hein, Hentschel, Runkler, and Udluft, 2017b).…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Interpretable policies for reinforcement learning by genetic programming

Hein

Udluft

Runkler

2018

Engineering Applications of Artificial Intelligence

102

View full text Add to dashboard Cite

The search for interpretable reinforcement learning policies is of high academic and industrial interest. Especially for industrial systems, domain experts are more likely to deploy autonomously learned controllers if they are understandable and convenient to evaluate. Basic algebraic equations are supposed to meet these requirements, as long as they are restricted to an adequate complexity. Here we introduce the genetic programming for reinforcement learning (GPRL) approach based on model-based batch reinforcement learning and genetic programming, which autonomously learns policy equations from pre-existing default state-action trajectory samples. GPRL is compared to a straight-forward method which utilizes genetic programming for symbolic regression, yielding policies imitating an existing well-performing, but non-interpretable policy. Experiments on three reinforcement learning benchmarks, i.e., mountain car, cart-pole balancing, and industrial benchmark, demonstrate the superiority of our GPRL approach compared to the symbolic regression method. GPRL is capable of producing well-performing interpretable reinforcement learning policies from pre-existing default trajectory data.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Interpretable policies for reinforcement learning by genetic programming

Hein

Udluft

Runkler

2018

Engineering Applications of Artificial Intelligence

102

View full text Add to dashboard Cite

show abstract

“…In order to approximately solve (4), we adopt the formalism of multiarmed bandits and proceed in two steps: first, we construct a finite set of candidate algorithms (Section IV-A), and then treat each of these algorithms as an arm and use a multiarmed bandit policy to select how to allocate computational time to the performance estimation of the different algorithms (Section IV-B). It is worth mentioning that this two-step approach follows a general methodology for automatic discovery that we already successfully applied to multiarmed bandit policy discovery [12], [13], reinforcement learning policy discovery [14], and optimal control policy discovery [15].…”

Section: Bandit-based Algorithm Discoverymentioning

confidence: 99%

“…One simple approach to approximately solve (4) is to estimate the objective function through an empirical mean computed using a finite set of training problems , drawn from (15) where denotes one outcome of algorithm with budget on problem . To solve (4), one can then compute this approximated objective function for all algorithms and simply return the algorithm with the highest score.…”

Section: B Bandit-based Algorithm Discoverymentioning

confidence: 99%

Monte Carlo Search Algorithm Discovery for Single-Player Games

Maes

St-Pierre

Ernst

2013

IEEE Trans. Comput. Intell. AI Games

Self Cite

View full text Add to dashboard Cite

Abstract-Much current research in AI and games is being devoted to Monte Carlo search (MCS) algorithms. While the quest for a single unified MCS algorithm that would perform well on all problems is of major interest for AI, practitioners often know in advance the problem they want to solve, and spend plenty of time exploiting this knowledge to customize their MCS algorithm in a problem-driven way. We propose an MCS algorithm discovery scheme to perform this in an automatic and reproducible way. First, we introduce a grammar over MCS algorithms that enables inducing a rich space of candidate algorithms. Afterwards, we search in this space for the algorithm that performs best on average for a given distribution of training problems. We rely on multiarmed bandits to approximately solve this optimization problem. The experiments, generated on three different domains, show that our approach enables discovering algorithms that outperform several well-known MCS algorithms such as upper confidence bounds applied to trees and nested Monte Carlo search. We also show that the discovered algorithms are generally quite robust with respect to changes in the distribution over the training problems.

show abstract

“…Learning such RL controllers in a way that produces interpretable high-level controllers is the scope of this paper and the proposed approach. Especially for real-world industry problems this is of high interest, since interpretable RL policies are expected to yield higher acceptance from domain experts than black-box solutions (Maes, Fonteneau, Wehenkel, and Ernst, 2012).…”

Section: Introductionmentioning

confidence: 99%

Particle swarm optimization for generating interpretable fuzzy reinforcement learning policies

Hein

Hentschel

Runkler

et al. 2017

Engineering Applications of Artificial Intelligence

View full text Add to dashboard Cite

Fuzzy controllers are efficient and interpretable system controllers for continuous state and action spaces. To date, such controllers have been constructed manually or trained automatically either using expert-generated problem-specific cost functions or incorporating detailed knowledge about the optimal control strategy. Both requirements for automatic training processes are not found in most real-world reinforcement learning (RL) problems. In such applications, online learning is often prohibited for safety reasons because it requires exploration of the problem's dynamics during policy training. We introduce a fuzzy particle swarm reinforcement learning (FPSRL) approach that can construct fuzzy RL policies solely by training parameters on world models that simulate real system dynamics. These world models are created by employing an autonomous machine learning technique that uses previously generated transition samples of a real system. To the best of our knowledge, this approach is the first to relate self-organizing fuzzy controllers to model-based batch RL. FPSRL is intended to solve problems in domains where online learning is prohibited, system dynamics are relatively easy to model from previously generated default policy transition samples, and it is expected that a relatively easily interpretable control policy exists. The efficiency of the proposed approach with problems from such domains is demonstrated using three standard RL benchmarks, i.e., mountain car, cart-pole balancing, and cart-pole swing-up. Our experimental results demonstrate high-performing, interpretable fuzzy policies.

show abstract

Policy Search in a Space of Simple Closed-form Formulas: Towards Interpretability of Reinforcement Learning

Cited by 23 publications

References 22 publications

Interpretable policies for reinforcement learning by genetic programming

Interpretable policies for reinforcement learning by genetic programming

Monte Carlo Search Algorithm Discovery for Single-Player Games

Particle swarm optimization for generating interpretable fuzzy reinforcement learning policies

Contact Info

Product

Resources

About