Reinforcement learning (RL) offers a set of various algorithms for in-situation behavior synthesis [1]. The Qlearning [2] technique is certainly the most used of the RL methods. Multilayer perceptron implementations of the Q-learning have been proposed early [3], due to the interest of the restricted memory need and the generalization capability [4]. Self-organizing map implementation of the Q-learning is more recent [5]. We propose to study the use and discuss the interest of this implementation comparing to a multilayer perceptron implementation or more classical ones. Experiments are performed in the real world with the miniature robot Khepera [6].
Q-learningReinforcement learning synthesises a mapping function between situations and actions by minimising a reinforcement signal. Q-learning algorithms store the expected reinforcement value associated to each situationaction pair. Three different functions are involved: memorisation, exploration and updating [4]. In respond to the present situation, an action is proposed by the robot memory. This action is the one that has the best rewarding probability. However, this proposition is eventually modified to allow an extensive exploration of the situationaction space. After the execution by the robot of the action in the real world, a reinforcement function provides a reinforcement value. This value, a simple qualitative criterion (+1, -1 or 0), is used by the updating algorithm to adjust the reward value (Q) associated to the situation-action pair. The learning is incremental, because the acquisition of the examples is carried out in real situations.