Analyzing and visualizing multiagent rewards in dynamic and stochastic domains

Agogino, Adrian; Tumer, Kagan

doi:10.1007/s10458-008-9046-9

Cited by 86 publications

(79 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Thus, an agent acting to increase the Difference Reward will also act to increase the global reward. This property is termed factoredness [5]. Further, because the Difference Reward only depends on the actions of agent i, noise from other agents is reduced in the feedback given by D i .…”

Section: Difference Rewardmentioning

confidence: 99%

Distributed sensor coordination for advanced energy systems

Tumer¹

2015

Self Cite

View full text Add to dashboard Cite

Section: Difference Rewardmentioning

confidence: 99%

Distributed sensor coordination for advanced energy systems

Tumer¹

2015

Self Cite

View full text Add to dashboard Cite

“…The weights of the neural network are adjusted through an evolutionary search algorithm [3,2] for ranking and subsequently locating successful networks within a population [12,3]. The algorithm maintains a population of ten networks, utilizes mutation to modify individuals, and ranks them based on a performance metric specific to the domain.…”

Section: Robot Capabilitiesmentioning

confidence: 99%

“…• The difference evaluation reflects the impact a robot has on the full system [3,2]. By removing the value of the system evaluation where robot i is inactive, the difference evaluation computes the value added by the observations of robot i alone.…”

Section: Robot Objectivesmentioning

confidence: 99%

Coevolution of heterogeneous multi-robot teams

Knudson

Tumer

2010

Proceedings of the 12th Annual Conference on Genetic and Evolutionary Computation

Self Cite

View full text Add to dashboard Cite

Evolving multiple robots so that each robot acting independently can contribute to the maximization of a system level objective presents significant scientific challenges. For example, evolving multiple robots to maximize aggregate information in exploration domains (e.g., planetary exploration, search and rescue) requires coordination, which in turn requires the careful design of the evaluation functions. Additionally, where communication among robots is expensive (e.g., limited power or computation), the coordination must be achieved passively, without robots explicitly informing others of their states/intended actions. Coevolving robots in these situations is a potential solution to producing coordinated behavior, where the robots are coupled through their evaluation functions. In this work, we investigate coevolution in three types of domains: (i) where precisely n homogeneous robots need to perform a task; (ii) where n is the optimal number of homogeneous robots for the task; and (iii) where n is the optimal number of heterogeneous robots for the task. Our results show that coevolving robots with evaluation functions that are locally aligned with the system evaluation significantly improve performance over robots evolving using the system evaluation function directly, particularly in dynamic environments.

show abstract

“…As a consequence, in this work, we use the difference reward as a starting point for the reward an agent receives after each step. Earlier work has shown that the difference reward significantly outperforms both agents receiving a purely local reward and all agents receiving the same system reward [3,2,33,32,36]. The difference reward is given by:…”

Section: Basic Agent Learningmentioning

confidence: 99%

“…In these cases, the learning needs of the agents are modified to account for their presence in a larger system [2,11,13,22,35,37]. However, though these methods have yielded tremendous advances in multiagent learning, they are principally based on an agent trying an action, receiving an evaluation of that action, and updating its own estimate on the "value" of taking that action in that state.…”

Section: Introductionmentioning

confidence: 99%

Learning From Actions Not Taken in Multiagent Systems

Tumer

Khani

2009

Advs. Complex Syst.

Self Cite

View full text Add to dashboard Cite

In large cooperative multiagent systems, coordinating the actions of the agents is critical to the overall system achieving its intended goal. Even when the agents aim to cooperate, ensuring that the agent actions lead to good system level behavior becomes increasingly difficult as systems become larger. One of the fundamental difficulties in such multiagent systems is the slow learning process where an agent not only needs to learn how to behave in a complex environment, but also needs to account for the actions of other learning agents. In this paper, we present a multiagent learning approach that significantly improves the learning speed in multiagent systems by allowing an agent to update its estimate of the rewards (e.g. value function in reinforcement learning) for all its available actions, not just the action that was taken. This approach is based on an agent estimating the counterfactual reward it would have received had it taken a particular action. Our results show that the rewards on such "actions not taken" are beneficial early in training, particularly when only particular "key" actions are used. We then present results where agent teams are leveraged to estimate those rewards. Finally, we show that the improved learning speed is critical in dynamic environments where fast learning is critical to tracking the underlying processes.

show abstract

Analyzing and visualizing multiagent rewards in dynamic and stochastic domains

Cited by 86 publications

References 21 publications

Distributed sensor coordination for advanced energy systems

Distributed sensor coordination for advanced energy systems

Coevolution of heterogeneous multi-robot teams

Learning From Actions Not Taken in Multiagent Systems

Contact Info

Product

Resources

About