“…Due to the use of a simple, final reward, reinforcement learning has found applications in interaction scenarios where an agent receives feedback from a user at the end of a sequence of actions such as dialogue management [11], visual homing and navigation [12,13,14,15,16], human-computer/robot interaction [17], robot navigation [18,19] and for learning skills in the Robocup Soccer Competition [20,21,22]. There have already been some initial attempts to explore reinforcement learning for restricted tasks in scheduling, routing, and network optimisation.…”