“…in human-robot interaction. This information can then be fused with additional sensor modalities for higher level planning [11]. To acquire the unknown trajectory of the gear side a reinforcement learning technique called Policy Improvement with Path Integrals (PI 2 ) [12] will be used.…”