An autonomous explore/exploit strategy

McMahon, Alex D.; Scott, Dan; Browne, Will N.

doi:10.1145/1102256.1102280

Cited by 10 publications

(7 citation statements)

References 4 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Similar to affect is the notion of comfort or safety, which has also been proposed to influence exploration behavior in robots (Likhachev & Arkin, 2000). Affect has been used in evolutionary algorithms to develop exploration/exploitation strategies in dynamic choice trials (McMahon, Scott, Baxter, & Browne, 2006), and affect has been embedded into the reinforcement-learning algorithm where reward is based on the happiness and sadness of the agent (Salichs & Malfaz, 2006).…”

Section: Related Workmentioning

confidence: 99%

The Neuromodulatory System: A Framework for Survival and Adaptive Behavior in a Challenging World

2008

View full text Add to dashboard Cite

Biological organisms have the ability to respond quickly to an ever-changing world. Because this adaptability is so critical for survival, all vertebrates have sub-cortical structures, which comprise the neuromodulatory systems, to regulate fundamental behavior and drive decision making in response to environmental events. In the vertebrate, there are separate neuromodulators that respond to threats, reward anticipation, novelty, and attentional effort. However, each of these neuromodulatory systems has a similar effect, that is, to cause an organism to be decisive when environmental conditions call for such actions, and allow the organism to be more exploratory when there are no pressing events. In this article, it is proposed that principles of the neuromodulatory system could provide a framework for controlling artificial agents that may improve current artificial agent behavior. These agents would operate autonomously, effectively explore their environment, and be decisive when environmental conditions call for action.

show abstract

Section: Related Workmentioning

confidence: 99%

The Neuromodulatory System: A Framework for Survival and Adaptive Behavior in a Challenging World

2008

View full text Add to dashboard Cite

show abstract

“…Strongly related to our approach to affect-modulated exploration is the research by McMahon, Scott, Baxter, and Browne (2006). The authors show how the discrete choice between exploration and exploitation trials can be controlled by a probability value that is derived from measures inspired by affect.…”

Section: Related Workmentioning

confidence: 99%

Affect, Anticipation, and Adaptation: Affect-Controlled Selection of Anticipatory Simulation in Artificial Adaptive Agents

2007

View full text Add to dashboard Cite

Emotion plays an important role in thinking. In this article we study affective control of the amount of simulated anticipatory behavior in adaptive agents using a computational model. Our approach is based on model-based reinforcement learning (RL) and inspired by the simulation hypothesis (Cotterill, 2001;Hesslow, 2002). The simulation hypothesis states that thinking is internal simulation of behavior using the same sensory-motor systems as those used for overt behavior. Here, we study the adaptiveness of an artificial agent, when action-selection bias is induced by an affect-controlled amount of simulated anticipatory behavior. To this end, we introduce an affect-controlled simulation-selection mechanism that uses the predictions of the agent's RL model to select anticipatory behaviors for simulation. Based on experiments with adaptive agents in two nondeterministic partially observable gridworlds we conclude that (1) internal simulation has an adaptive benefit and (2) affective control can reduce the amount of simulation needed for this benefit. This is specifically the case if the following relation holds: positive affect decreases the amount of simulation towards simulating the best potential next action, while negative affect increases the amount of simulation towards simulating all potential next actions. In essence we use artificial affect to control mental exploration versus exploitations. Thus, agents "feeling positive" can think ahead in a narrow sense and free up working memory resources, while agents "feeling negative" must think ahead in a broad sense and maximize usage of working memory. Our results are consistent with several psychological findings on the relation between affect and learning, and contribute to answering the question of when positive versus negative affect is useful during adaptation.

show abstract

“…Selection mechanisms like -greedy have been applied to classifier systems to manage the exploration/exploitation trade-off (e.g. [11,14,28]). However, such mechanisms are typically used, as in TD methods, to select among individual actions, not to allocate evaluations among an entire population.…”

Section: Related and Future Workmentioning

confidence: 99%

Evolutionary Computation for Reinforcement Learning

Whiteson

2012

Adaptation, Learning, and Optimization

View full text Add to dashboard Cite

In reinforcement learning, an agent interacting with its environment strives to learn a policy that specifies, for each state it may encounter, what action to take. Evolutionary computation is one of the most promising approaches to reinforcement learning but its success is largely restricted to off-line scenarios. In on-line scenarios, an agent must strive to maximize the reward it accrues while it is learning. Temporal difference (TD) methods, another approach to reinforcement learning, naturally excel in on-line scenarios because they have selection mechanisms for balancing the need to search for better policies (exploration) with the need to accrue maximal reward (exploitation). This paper presents a novel way to strike this balance in evolutionary methods by borrowing the selection mechanisms used by TD methods to choose individual actions and using them in evolution to choose policies for evaluation. Empirical results in the mountain car and server job scheduling domains demonstrate that these techniques can substantially improve evolution's on-line performance in stochastic domains.

show abstract

An autonomous explore/exploit strategy

Cited by 10 publications

References 4 publications

The Neuromodulatory System: A Framework for Survival and Adaptive Behavior in a Challenging World

The Neuromodulatory System: A Framework for Survival and Adaptive Behavior in a Challenging World

Affect, Anticipation, and Adaptation: Affect-Controlled Selection of Anticipatory Simulation in Artificial Adaptive Agents

Evolutionary Computation for Reinforcement Learning

Contact Info

Product

Resources

About