Approximate dynamic programming with a fuzzy parameterization

Buşoniu, Lucian; Ernst, Damien; Schutter, Bart De; Babuška, Robert

doi:10.1016/j.automatica.2010.02.006

Cited by 55 publications

(38 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…, 15π} Since an exact optimal solution for the inverted pendulum problem is not known, in order to approximate the regret (3), a near-optimal solution is computed instead. To this end, the fuzzy Q-iteration algorithm [7] is modified to work for the sparsely stochastic systems considered in this paper, and applied to the inverted pendulum using a very accurate approximator over the state space. Figure 3, top reports the (approximate) regret of the three algorithms, averaged over the set X 0 .…”

Section: Resultsmentioning

confidence: 99%

Optimistic planning for sparsely stochastic systems

Buşoniu

Munos

Schutter

et al. 2011

2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL)

Self Cite

View full text Add to dashboard Cite

Abstract-We propose an online planning algorithm for finiteaction, sparsely stochastic Markov decision processes, in which the random state transitions can only end up in a small number of possible next states. The algorithm builds a planning tree by iteratively expanding states, where each expansion exploits sparsity to add all possible successor states. Each state to expand is actively chosen to improve the knowledge about action quality, and this allows the algorithm to return a good action after a strictly limited number of expansions. More specifically, the active selection method is optimistic in that it chooses the most promising states first, so the novel algorithm is called optimistic planning for sparsely stochastic systems. We note that the new algorithm can also be seen as model-predictive (receding-horizon) control. The algorithm obtains promising numerical results, including the successful online control of a simulated HIV infection with stochastic drug effectiveness.

show abstract

Section: Resultsmentioning

confidence: 99%

Optimistic planning for sparsely stochastic systems

Buşoniu

Munos

Schutter

et al. 2011

2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL)

Self Cite

View full text Add to dashboard Cite

show abstract

“…. , N. This value iteration is guaranteed to converge (Buşoniu et al, 2010) and terminates when the following condition is satisfied:…”

Section: Preliminariesmentioning

confidence: 99%

Optimal Control via Reinforcement Learning with Symbolic Policy Approximation

2017

Self Cite

View full text Add to dashboard Cite

Model-based reinforcement learning (RL) algorithms can be used to derive optimal control laws for nonlinear dynamic systems. With continuous-valued state and input variables, RL algorithms have to rely on function approximators to represent the value function and policy mappings. This paper addresses the problem of finding a smooth policy based on the value function represented by means of a basis-function approximator. We first show that policies derived directly from the value function or represented explicitly by the same type of approximator lead to inferior control performance, manifested by non-smooth control signals and steady-state errors. We then propose a novel method to construct a smooth policy represented by an analytic equation, obtained by means of symbolic regression. The proposed method is illustrated on a reference-tracking problem of a 1-DOF robot arm operating under the influence of gravity. The results show that the analytic control law performs at least equally well as the original numerically approximated policy, while it leads to much smoother control signals. In addition, the analytic function is readable (as opposed to black-box approximators) and can be used in further analysis and synthesis of the closed loop.

show abstract

“…It can also be simply combined with fuzzy logic and provide the relationship between the states and the accessible action, which is the same as creating the fuzzy logic "if. ..then" engine [23][24][25].…”

Section: Introductionmentioning

confidence: 99%