Abstract:This paper presents a problem of model learning for the purpose of learning how to navigate a ball to a goal state in a circular maze environment with two degrees of freedom. The motion of the ball in the maze environment is influenced by several non-linear effects such as dry friction and contacts, which are difficult to model physically. We propose a semiparametric model to estimate the motion dynamics of the ball based on Gaussian Process Regression equipped with basis functions obtained from physics first … Show more
“…When prior knowledge about the system dynamics is available, for example given by physics first principles, the so called physically inspired (PI) kernel can be derived. The PI kernel is a linear kernel defined on suitable basis functions φpxq, see for instance [6]. More precisely, φpxq P R d φ is a, possibly nonlinear, transformation of the GP input x determined by the physical model.…”
Section: Squared Exponential (Se)mentioning
confidence: 99%
“…Then we have k P I px tj , xt k q " φ T px tj qΣ P I φpx t k q, where Σ P I is a d φ ˆdφ positive-definite matrix, whose elements are the k P I hyperparameters; to limit the number of hyperparameters, a standard choice consists in considering Σ P I to be diagonal. To compensate possible inaccuracies of the physical model, it is common to combine k P I with an SE kernel, obtaining so called semi-parametric kernels [17,6], expressed as…”
Section: Squared Exponential (Se)mentioning
confidence: 99%
“…model-free RL algorithms. In particular, remarkable results have been obtained relying on Gaussian Processes (GPs) [2] to model the systems dynamics, see for instance [3,4,5,6,7]. In this paper, we cosider the application of MBRL algorithms to PMS, i.e., systems where only a subset of the state components can be directly measured, and the remaining components can be estimated through proper state observer.…”
In this paper, we propose a Model-Based Reinforcement Learning (MBRL) algorithm for Partially Measurable Systems (PMS), i.e., systems where the state can not be directly measured, but must be estimated through proper state observers. The proposed algorithm, named Monte Carlo Probabilistic Inference for Learning COntrol for Partially Measurable Systems (MC-PILCO4PMS), relies on Gaussian Processes (GPs) to model the system dynamics, and on a Monte Carlo approach to update the policy parameters. W.r.t. previous GP-based MBRL algorithms, MC-PILCO4PMS models explicitly the presence of state observers during policy optimization, allowing to deal PMS. The effectiveness of the proposed algorithm has been tested both in simulation and in two real systems.
“…When prior knowledge about the system dynamics is available, for example given by physics first principles, the so called physically inspired (PI) kernel can be derived. The PI kernel is a linear kernel defined on suitable basis functions φpxq, see for instance [6]. More precisely, φpxq P R d φ is a, possibly nonlinear, transformation of the GP input x determined by the physical model.…”
Section: Squared Exponential (Se)mentioning
confidence: 99%
“…Then we have k P I px tj , xt k q " φ T px tj qΣ P I φpx t k q, where Σ P I is a d φ ˆdφ positive-definite matrix, whose elements are the k P I hyperparameters; to limit the number of hyperparameters, a standard choice consists in considering Σ P I to be diagonal. To compensate possible inaccuracies of the physical model, it is common to combine k P I with an SE kernel, obtaining so called semi-parametric kernels [17,6], expressed as…”
Section: Squared Exponential (Se)mentioning
confidence: 99%
“…model-free RL algorithms. In particular, remarkable results have been obtained relying on Gaussian Processes (GPs) [2] to model the systems dynamics, see for instance [3,4,5,6,7]. In this paper, we cosider the application of MBRL algorithms to PMS, i.e., systems where only a subset of the state components can be directly measured, and the remaining components can be estimated through proper state observer.…”
In this paper, we propose a Model-Based Reinforcement Learning (MBRL) algorithm for Partially Measurable Systems (PMS), i.e., systems where the state can not be directly measured, but must be estimated through proper state observers. The proposed algorithm, named Monte Carlo Probabilistic Inference for Learning COntrol for Partially Measurable Systems (MC-PILCO4PMS), relies on Gaussian Processes (GPs) to model the system dynamics, and on a Monte Carlo approach to update the policy parameters. W.r.t. previous GP-based MBRL algorithms, MC-PILCO4PMS models explicitly the presence of state observers during policy optimization, allowing to deal PMS. The effectiveness of the proposed algorithm has been tested both in simulation and in two real systems.
“…However, physical parameters of the ODE model and hyperparameters of GPR are learned separately, which may yield a suboptimal model. In [9], [10], instead of discrete-time, continuous-time state transition dynamics are learned. The GPR is used to learn the mapping from positional state and action to acceleration.…”
Recently the data-driven learning of dynamic systems has become a promising approach because no physical knowledge is needed. Pure machine learning approaches such as Gaussian process regression (GPR) learns a dynamic model from data, with all physical knowledge about the system discarded. This goes from one extreme, namely methods based on optimizing parametric physical models derived from physical laws, to the other. GPR has high flexibility and is able to model any dynamics as long as they are locally smooth, but can not generalize well to unexplored areas with little or no training data. The analytic physical model derived under assumptions is an abstract approximation of the true system, but has global generalization ability. Hence the optimal learning strategy is to combine GPR with the analytic physical model. This paper proposes a method to learn dynamic systems using GPR with analytic ordinary differential equations (ODEs) as prior information. The one-time-step integration of analytic ODEs is used as the mean function of the Gaussian process prior. The total parameters to be trained include physical parameters of analytic ODEs and parameters of GPR. A novel method is proposed to simultaneously learn all parameters, which is realized by the fully Bayesian GPR and more promising to learn an optimal model. The standard Gaussian process regression, the ODE method and the existing method in the literature are chosen as baselines to verify the benefit of the proposed method. The predictive performance is evaluated by both one-time-step prediction and long-term prediction. By simulation of the cart-pole system, it is demonstrated that the proposed method has better predictive performances.
“…However, the performance of the proposed method gradually degrade through time with the thermal model deviates from the actual system. Recently, Bayesian estimation based techniques has also been introduced to the system identification problem [18]- [22]. In particular, prior information is introduced to the identification process by designing a covariance, which is also known as kernel in the machine learning literature.…”
The overheating caused by the operation of implantable device can cause damage to the surrounding tissue. In applications like neural prosthesis, • C of temperature increase could lead to irreversible damage to the subject. Predicting the overheating effect is therefore critical to maintain safe operation. This work proposes a Bayesian recursive multi-step prediction method for implantable device to predict the overheating effect. The method proposed in this article achieves accurate prediction within a horizon with low complexity by model updating that iteratively minimizes a function of the j-step-ahead prediction error. At each time instant, the new available input output data are stored in a First In First Out (FIFO) queue of fixed length, and the model parameters are updated by iteratively minimizing the j-step-ahead prediction error of the new data. Moreover, the regularization methods are introduced to improve the prediction performance by taking the Bayesian interpretation of the parameters into consideration. Monte Carlo simulation studies indicate that the developed method is able to estimate the fundamental dynamics of the system when the prediction model is underparametered, and is robust to measurement noise. For time varying systems, the developed method can capture the system dynamics during the system variation. The proposed method is demonstrated via an in-vitro test vehicle, which shows that the temperature increase can be predicted with high accuracy and low complexity.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.