“…Reinforcement learning (RL) involves performing a number of exploratory actions, often with a degree of randomness, which can lead to damage of the robot or its environment. This problem has been previously addressed by learning in simulation [1], [2], [3], safety exploration [4], [5], imitation learning [6], [7] and learning from demonstration (LfD) [8], [9]. However, some of these solutions do not guarantee safety, while others face difficulties when transferring from simulated to real environments.…”