“…The idea motivating this choice is the following: the MP kernel allows capturing possible modes of the system that are polynomial functions in x, which are typical in mechanical systems [16], while the SE kernel models more complex behaviors not captured by the polynomial kernel.…”
In this paper, we propose a Model-Based Reinforcement Learning (MBRL) algorithm for Partially Measurable Systems (PMS), i.e., systems where the state can not be directly measured, but must be estimated through proper state observers. The proposed algorithm, named Monte Carlo Probabilistic Inference for Learning COntrol for Partially Measurable Systems (MC-PILCO4PMS), relies on Gaussian Processes (GPs) to model the system dynamics, and on a Monte Carlo approach to update the policy parameters. W.r.t. previous GP-based MBRL algorithms, MC-PILCO4PMS models explicitly the presence of state observers during policy optimization, allowing to deal PMS. The effectiveness of the proposed algorithm has been tested both in simulation and in two real systems.
“…The idea motivating this choice is the following: the MP kernel allows capturing possible modes of the system that are polynomial functions in x, which are typical in mechanical systems [16], while the SE kernel models more complex behaviors not captured by the polynomial kernel.…”
In this paper, we propose a Model-Based Reinforcement Learning (MBRL) algorithm for Partially Measurable Systems (PMS), i.e., systems where the state can not be directly measured, but must be estimated through proper state observers. The proposed algorithm, named Monte Carlo Probabilistic Inference for Learning COntrol for Partially Measurable Systems (MC-PILCO4PMS), relies on Gaussian Processes (GPs) to model the system dynamics, and on a Monte Carlo approach to update the policy parameters. W.r.t. previous GP-based MBRL algorithms, MC-PILCO4PMS models explicitly the presence of state observers during policy optimization, allowing to deal PMS. The effectiveness of the proposed algorithm has been tested both in simulation and in two real systems.
“…Thus, in this setting, it is possible to analytically compute the policy gradient from long-term predictions. However, as already mentioned in Section I, the Gaussian approximation performed in moment matching is also the cause of two main weaknesses: (i) The computation of the two moments has been performed assuming the use of SE kernels, which might lead to poor generalization properties in data that have not been seen during training [9], [10], [11], [12]. (ii) Moment matching allows modeling only unimodal distributions, which might be a too restrictive approximation of the real system behavior.…”
Section: B Gpr and One-step-ahead Predictionsmentioning
confidence: 99%
“…The proposed speed-integration model learns only d x {2 GPs, each of which models the evolution of a distinct velocity component ∆ pi k q t , with i k P I 9 q . Then, the evolution of the position change, ∆ pi k q t , with i k P I q is computed according to (9) and the predicted change in velocity.…”
Section: A Model Learningmentioning
confidence: 99%
“…(ii) The computation of the moments is shown to be tractable only when considering Squared Exponential (SE) kernels and differentiable cost functions. In particular, the limitation on the kernel choice might be very stringent, as GPs with SE kernel impose smooth properties on the posterior estimator and might show poor generalization properties in data that have not be seen during training [9], [10], [11], [12].…”
In this paper, we present a Model-Based Reinforcement Learning algorithm named Monte Carlo Probabilistic Inference for Learning COntrol (MC-PILCO). The algorithm relies on Gaussian Processes (GPs) to model the system dynamics and on a Monte Carlo approach to estimate the policy gradient. This defines a framework in which we ablate the choice of the following components: (i) the selection of the cost function, (ii) the optimization of policies using dropout, (iii) an improved data efficiency through the use of structured kernels in the GP models. The combination of the aforementioned aspects affects dramatically the performance of MC-PILCO. Numerical comparisons in a simulated cart-pole environment show that MC-PILCO exhibits better data-efficiency and control performance w.r.t. state-of-theart GP-based MBRL algorithms. Finally, we apply MC-PILCO to real systems, considering in particular systems with partially measurable states. We discuss the importance of modeling both the measurement system and the state estimators during policy optimization. The effectiveness of the proposed solutions has been tested in simulation and in two real systems, a Furuta pendulum and a ball-and-plate.
“…with the mean value µ θ (k + i + 1|k) and variance Σ θ (k + i + 1|k) calculations similar to (30). The difference between models (46) and (28) is that the former depends on the actual internal state α(k + i), while the latter uses the estimated internal stateα(k + i|k). Model 28is actually used for θ trajectory prediction through the MPC formulation.…”
Ranging from cart-pole systems and autonomous bicycles to bipedal robots, control of these underactuated balance robots aims to achieve both external (actuated) subsystem trajectory tracking and internal (unactuated) subsystem balancing tasks with limited actuation authority. This paper proposes a learning model-based control framework for underactuated balance robots. The key idea to simultaneously achieve tracking and balancing tasks is to design control strategies in slow-and fast-time scales, respectively. In slow-time scale, model predictive control (MPC) is used to generate the desired internal subsystem trajectory that encodes the external subsystem tracking performance and control input. In fast-time scale, the actual internal trajectory is stabilized to the desired internal trajectory by using an inverse dynamics controller. The coupling effects between the external and internal subsystems are captured through the planned internal trajectory profile and the dual structural properties of the robotic systems. The control design is based on Gaussian processes (GPs) regression model that are learned from experiments without need of priori knowledge about the robot dynamics nor successful balance demonstration. The GPs provide estimates of modeling uncertainties of the robotic systems and these uncertainty estimations are incorporated in the MPC design to enhance the control robustness to modeling errors. The learning-based control design is analyzed with guaranteed stability and performance. The proposed design is demonstrated by experiments on a Furuta pendulum and an autonomous bikebot.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.