Model-Based Policy Search Using Monte Carlo Gradient Estimation With Real Systems Application

Amadio, Fabio; Libera, Alberto Dalla; Oboe, Roberto; Nikovski, Daniel; Carli, Ruggero; Romeres, Diego

doi:10.1109/tro.2022.3184837

Cited by 10 publications

(6 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In this section, we evaluate the performance of the control policies learned by the different VF-MC-PILCO setups and by MC-PILCO4PMS. Notice that MC-PILCO4PMS achieved results comparable to or better than other state-of-the-art GPbased MBRL algorithms, see [11]. The cumulative costs and success rates obtained at each trial in the 50 experiments are reported in Fig.…”

Section: B Policy Learning Resultsmentioning

confidence: 78%

“…We compare the proposed approach with the s.o.t.a. MBRL algorithm specifically designed to deal with partial state measurability of real mechanical systems, MC-PILCO4PMS [11]. MC-PILCO4PMS follows a particle-based policy gradient framework similar to the one depicted in Sec.…”

Section: Policy Structurementioning

confidence: 99%

“…In particular, we experimented on two benchmark systems: a Furuta pendulum, and a ball-and-plate (Figure 8) 2 . The objective is to compare the performance obtained by VF-MC-PILCO in these two setups with the results of MC-PILCO4PMS reported in [11]. 2 A video of the experiments on real mechanical systems is available at the following link https://youtu.be/Hx3Y1Ib-6Tc.…”

Section: Experiments On Real Mechanical Systemsmentioning

confidence: 99%

“…encoders, while velocities can only be estimated from the history of sampled positions. In our previous work [11], we proposed an MBRL algorithm, called MC-PILCO4PMS, specifically tailored to deal with Partially Measurable Systems and take correctly into account the presence of online and offline state observers. It proved able to robustly learn from scratch how to control mechanical systems, in both simulated and real environments even when the velocity is not directly measurable.…”

Section: Introductionmentioning

confidence: 99%

“…The comparisons are carried against MC-PILCO4PMS because this algorithm was shown to outperform other s.o.t.a. MBRL algorithms in [11].…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Learning Control from Raw Position Measurements

Amadio¹,

Libera²,

Nikovski³

et al. 2023

Preprint

View full text Add to dashboard Cite

We propose a Model-Based Reinforcement Learning (MBRL) algorithm named VF-MC-PILCO, specifically designed for application to mechanical systems where velocities cannot be directly measured. This circumstance, if not adequately considered, can compromise the success of MBRL approaches. To cope with this problem, we define a velocityfree state formulation which consists of the collection of past positions and inputs. Then, VF-MC-PILCO uses Gaussian Process Regression to model the dynamics of the velocityfree state and optimizes the control policy through a particlebased policy gradient approach. We compare VF-MC-PILCO with our previous MBRL algorithm, MC-PILCO4PMS, which handles the lack of direct velocity measurements by modeling the presence of velocity estimators. Results on both simulated (cart-pole and UR5 robot) and real mechanical systems (Furuta pendulum and a ball-and-plate rig) show that the two algorithms achieve similar results. Conveniently, VF-MC-PILCO does not require the design and implementation of state estimators, which can be a challenging and time-consuming activity to be performed by an expert user.

show abstract

Section: B Policy Learning Resultsmentioning

confidence: 78%

Section: Policy Structurementioning

confidence: 99%

Section: Experiments On Real Mechanical Systemsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

“…The comparisons are carried against MC-PILCO4PMS because this algorithm was shown to outperform other s.o.t.a. MBRL algorithms in [11].…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Learning Control from Raw Position Measurements

Amadio¹,

Libera²,

Nikovski³

et al. 2023

Preprint

View full text Add to dashboard Cite

show abstract

Controlled Gaussian process dynamical models with application to robotic cloth manipulation

Amadio

Delgado-Guerrero

Colomé

et al. 2023

Int. J. Dynam. Control

View full text Add to dashboard Cite

Over the last years, significant advances have been made in robotic manipulation, but still, the handling of non-rigid objects, such as cloth garments, is an open problem. Physical interaction with non-rigid objects is uncertain and complex to model. Thus, extracting useful information from sample data can considerably improve modeling performance. However, the training of such models is a challenging task due to the high-dimensionality of the state representation. In this paper, we propose Controlled Gaussian Process Dynamical Models (CGPDMs) for learning high-dimensional, nonlinear dynamics by embedding them in a low-dimensional manifold. A CGPDM is constituted by a low-dimensional latent space, with an associated dynamics where external control variables can act and a mapping to the observation space. The parameters of both maps are marginalized out by considering Gaussian Process priors. Hence, a CGPDM projects a high-dimensional state space into a smaller dimension latent space, in which it is feasible to learn the system dynamics from training data. The modeling capacity of CGPDM has been tested in both a simulated and a real scenario, where it proved to be capable of generalizing over a wide range of movements and confidently predicting the cloth motions obtained by previously unseen sequences of control actions.

show abstract