“…For example, in our previous studies on autonomous robot systems such as an intelligent wheelchair, we used RL algorithms for an agent in order to learn how to avoid obstacles and evolve cooperative behavior with other robots (Hamagami & Hirata, 2004;. Furthermore, RL has been widely used to solve the elevator dispatching problem (Crites & Barto, 1996), air-conditioning management problem (Dalamagkidisa et al, 2007), process control problem (S. Syafiie et al, 2008), etc. However, in most cases, RL algorithms have been successfully used only in ideal situations that are based on Markov decision processes (MDPs).…”