“…By taking action 'a' based on state 's', the algorithm evaluates the feedback rewards and updates its Q values from the rewards sequentially. In an acceptable learning period, the Q-learning algorithm performs excellently, as evidenced by its successful application to collision-avoidance, homing, and robot cooperation [4,6,7,9,13].…”