Self-organizing map for reinforcement learning: obstacle-avoidance with Khepera

Sehad, S.; Touzet, Claude

doi:10.1109/fpa.1994.636137

Cited by 10 publications

(4 citation statements)

References 1 publication

Supporting

Mentioning

Contrasting

Order By: Relevance

“…One solution to this problem relies on applying generalization techniques to states. Some systems have used decision trees (Chapman & Kaelbling, 1995), neural networks (Sehad & Touzet, 1994;Smith, 2002;Touzet, 1997;Wedel & Polani, 1996), and statistical clustering (Mahavedan & Connell, 1992). The main drawback of this solution is difficult to control ''perceptual aliasing'' problem due to over-generalization.…”

Section: Related Researchesmentioning

confidence: 97%

Dynamic packaging in e-retailing with stochastic demand over finite horizons: A Q-learning approach

Cheng

2009

Expert Systems with Applications

View full text Add to dashboard Cite

Section: Related Researchesmentioning

confidence: 97%

Dynamic packaging in e-retailing with stochastic demand over finite horizons: A Q-learning approach

Cheng

2009

Expert Systems with Applications

View full text Add to dashboard Cite

“…Many researchers have extended the Q-learning structure to solve continuous state and action problems. Lin developed a Q-learning structure based on a neural network [7], Saito and Fukuda proposed a Q-learning structure based on a cerebellar model articulation controller [8], and Sehad and Touzet applied a self-organizing map (SOM) to the Q-learning structure [9]. Notably, all these algorithms still have different shortcomings.…”

Section: Introductionmentioning

confidence: 99%

Self-Learning for a Humanoid Robotic Ping-Pong Player

Lai

Tsay

2011

Advanced Robotics

View full text Add to dashboard Cite

Imitating the learning process of a human playing ping-pong is extremely complex. This work proposes a suitable learning strategy. First, an inverse kinematics solution is presented to obtain the smooth joint angles of a redundant anthropomorphic robot arm in order to imitate the paddle motion of a human ping-pong player. As humans instinctively determine which posture is suitable for striking a ball, this work proposes two novel processes: (i) estimating ball states and predicting trajectory using a fuzzy adaptive resonance theory network, and (ii) self-learning the behavior for each strike using a self-organizing map-based reinforcement learning network that imitates human learning behavior. Experimental results demonstrate that the proposed algorithms work effectively when applied to an actual humanoid robot playing ping-pong.

show abstract

“…Multilayer perceptron implementations of the Q-learning have been proposed early [3], due to the interest of the restricted memory need and the generalization capability [4]. Self-organizing map implementation of the Q-learning is more recent [5]. We propose to study the use and discuss the interest of this implementation comparing to a multilayer perceptron implementation or more classical ones.…”

mentioning

confidence: 99%

Neural reinforcement learning for an obstacle avoidance behavior

Touzet¹

1996

IEE Colloquium on Self Learning Robots

Self Cite

View full text Add to dashboard Cite

Reinforcement learning (RL) offers a set of various algorithms for in-situation behavior synthesis [1]. The Qlearning [2] technique is certainly the most used of the RL methods. Multilayer perceptron implementations of the Q-learning have been proposed early [3], due to the interest of the restricted memory need and the generalization capability [4]. Self-organizing map implementation of the Q-learning is more recent [5]. We propose to study the use and discuss the interest of this implementation comparing to a multilayer perceptron implementation or more classical ones. Experiments are performed in the real world with the miniature robot Khepera [6]. Q-learningReinforcement learning synthesises a mapping function between situations and actions by minimising a reinforcement signal. Q-learning algorithms store the expected reinforcement value associated to each situationaction pair. Three different functions are involved: memorisation, exploration and updating [4]. In respond to the present situation, an action is proposed by the robot memory. This action is the one that has the best rewarding probability. However, this proposition is eventually modified to allow an extensive exploration of the situationaction space. After the execution by the robot of the action in the real world, a reinforcement function provides a reinforcement value. This value, a simple qualitative criterion (+1, -1 or 0), is used by the updating algorithm to adjust the reward value (Q) associated to the situation-action pair. The learning is incremental, because the acquisition of the examples is carried out in real situations.

show abstract

Self-organizing map for reinforcement learning: obstacle-avoidance with Khepera

Cited by 10 publications

References 1 publication

Dynamic packaging in e-retailing with stochastic demand over finite horizons: A Q-learning approach

Dynamic packaging in e-retailing with stochastic demand over finite horizons: A Q-learning approach

Self-Learning for a Humanoid Robotic Ping-Pong Player

Neural reinforcement learning for an obstacle avoidance behavior

Contact Info

Product

Resources

About